Morph Man

From Learning Japanese Wiki (RtKWiki)
Jump to: navigation, search

What is it?

In essence, it is a plugin that makes Anki present new cards in optimal order.

In detail, it is a system that keeps track of what you know (in terms of morphemes, which are like words or particles) across all your collection and updates card fields with their current:

  • k+N and m+N value. That is, how many unique morphemes does it have that you don't know or your knowledge of isn't mature. Thus known+N (k+N) and mature+N (m+N).
  • a list of the unknown / unmature morphemes
  • focus morpheme - for i+1 cards, the particular morpheme you need to learn to make that sentence i+0
  • morph man index - an overall suggested order to learn sentences in, which it uses to modify new cards' "due" value to make new cards appear in order of difficulty

It also contains a suite of tools for doing other morpheme based analysis, such as:

  • manager for analyzing, comparing, merging dbs, and creating dbs from text files
  • adaptive subs generator for creating custom subtitles files that show the Japanese subs for lines you know (i+0) and English/other subs for lines you don't, based on known.db and mature.db.
  • quickly find alternative sentences in your collection to learn a vocab item from

Requirements

How to use it

Default Morph Man 3 fields

  • k+N
  • m+N
  • unknowns
  • unmatures
  • unknownFreq
  • morphManIndex

These fields can be modified in the "Morph Man 3 config file

Once you download the plugin, modify config.py, and restart Anki, you can invoke the calculator via the menu or Ctrl+M to:

  1. Generate an all.db that contains all the morphemes you know based on your Anki collection
  2. Merge in external.db which is a user managed database that tracks outside-Anki knowledge
  3. Add morphemes from cards of at least some maturity (configurable with the "threshold_known" setting) to your known.db as well as a similar operation for mature.db
  4. Update the fields in your notes to contain the new information it gathered and resorts the order new cards will appear in by modifying due date

Configuration

MorphMan 3 is configured by the config.py file in Anki/addons/morph/config.py, which contains a python dictionary of various strings and lists that control MorphMan called 'default'. You can also specify overrides based on the profile, model, or deck (deck has priority over model, which has priority over profile, which has priority over default). Not all fields work for all override levels. Instructions on how python data structures like dictionaries work can be found online. For example http://docs.python.org/2/tutorial/datastructures.html#dictionaries

Example 1: Disable MorphMan calculation for everything but a particular model

 default['enabled'] = False
 model_overrides['subs2srs model'] = { 'enabled':True }

Example 2: Analyze 'Expression' field normally, but also analyze 'Extras' for 'Genki' model cards

 model_overrides['Genki'] = { 'morph_fields'= [ u'Expression', u'Extras' ] }

Example 3: Disable MorphMan for a particular profile

 profile_overrides['User 1'] = { 'enabled'=False }

FAQ

How can I change which fields are used in the analysis of my deck?

Change "morph_fields", which is a python list of field name strings to analyze.

For example the default is

 'morph_fields':[u'Expression']

you could also do

 'morph_fields':[u'Context before', u'Expression', u'Context after']

What are these thresholds?

You can specify the interval required for the plugin to consider the card mature (note: Anki internally considers this 21 days), when a card is known, and when it is "seen".

What if I want to manually add things to my known.db?

Interval databases such as known.db, mature.db, and seen.db should be auto-generated by MorphMan based on the data in all.db (also auto-generated based on your Anki collection) and external.db (a user managed database for manually adding knowledge learnt outside Anki). Thus, you should add to external.db and re-run MorphMan's calculation to regenerate known.db.

Of course, it's good practice to separate external knowledge into different databases and them merge them into external.db. For example merging external.genki.db + external.rikachan.db + external.lightNovels.db into external.db. This makes it easier to make small updates to external.db without starting over from scratch.

How can I tell what MorphMan is doing when it's processing?

You can inspect the log file (morphman.log in your profile's directory).

How is Morph Man Index calculated?

Morph Man Index is currently based on the number of unique unknown morphemes, how much the length of the sentence differs from (number of your choosing) morphemes of length (ie, avoid short/long sentences), and the frequency at which the unknown morphemes come up in other notes you're trying to learn. Thus the formula is:

 mmi = 10000*N_k + 1000*lenDiff + freq
   where
       mmi = Morph Man Index
       N_k = number of unique unknown morphemes
       lenDiff = max( 0, min( 9, abs( N - length_of_sentence ) -2 ) )
       freq = 999 - min( 999, avg_freq_of_focus_morpheme_in_all_cards )

I'm struggling reading some native text, what can Morph Man do for me?

Say you want to read some text but are struggling. You can use Morph Man to help you figure out what's causing trouble and direct your learning.

  1. Get an electronic text copy of what you want to read and create a DB from it via morph man. Load this as DB A and your mature/known.db as DB B then do A-B to see the stuff you don't know.
  2. Use the part of speech whitelist to show only particles (助詞) to see if there's any you're unfamiliar with. If any, look these up in a good grammar reference book and study some examples.
  3. Afterwards, see how many nouns / verbs / other vocab you're missing (there's a POS breakdown on the right panel and you can put these in the whitelist to filter as well).

You can use this information to help you just generally prioritize your studies or you could get fancy with the Matched Morpheme feature (not yet in MorphMan 3.0).

What are adaptive subs?

Adaptive subs are custom made subtitles files based on your personal known.db and mature.db that combine subtitles from Japanese and another language. To use it

  1. Open Morph Man > Adaptive Subs
  2. Set the part of speech black/whitelist and template for how to output mature/known/unknown lines.
  3. Click convert and select a dueling subtitles file in .ass format (you can make these with subs2srs) and output file path.

For the template format, you can use the following macros:

jpn 
the Japanese version of the line
eng 
the English/other version of the line
N_k 
number of unknown morphemes (i+N known)
N_m 
number of unmature morphemes (i+N mature)
unknowns 
the unknown morphemes delimited by 2 spaces
unmatures 
the unmature morphemes delimited by 2 spaces

Example formats to try

minimalist 

Shows English subs if you don't fully know a line, otherwise the Japanese subs. Shows no subs if the line is very well known.

* mature = ""
* known = "%(jpn)s"
* unknown = "%(eng)s"
keep track of learning 

Shows Japanese subs if known and a number at the end with how many unmatures (if any), otherwise shows English version with number of unknowns.

* mature = "%(jpn)s"
* known = "%(jpn)s [%(N_m)s]"
* unknown = "%(eng)s [%(N_k)s]"
debugging 

Shows English version with a list of unknowns and the total count at the end if the line isn't fully known. Otherwise it shows the Japanese version but includes the English line as well if it's not very well known

* mature = ""
* known = "%(jpn)s [%(eng)s]"
* unknown = "%(eng)s [%(unknowns)s] %(N_k)s"

Development status

Current version is v3.3

Changelog

3.1 to 3.3

  • Track recently seen focus morphemes (since Anki was opened) and automatically skip new cards with the same focus (ie. "alternatives"). Can enable/disable by deck with 'next new card feature' option. Can also display how many cards will be skipped via 'print number of alternatives skipped' option.
  • 'next new card feature' also alters Anki's selection of new cards to only pick those which have a focus morpheme and are k+1
  • Hit "K" to tag a card's as already known, skipping it (and thus it's alternatives, if any, as per above) and letting Morph Man consider it mature in future recalcs. Can also add this "alreadyKnown" tag to cards manually.
  • Hit "L" to search for cards with the same focus morph, useful for finding alternative sentences to learn a word from or simply see more examples of it
  • Ctrl-Shift-N in browser to immediately review the selected cards. This is useful if you decide to learn a word from an alternative sentence after using "L" (above) during review or came across some word outside Anki and want to immediate practice it.
  • Ctrl-Shift-P to batch play all the media of the selected cards. This is useful if you want to quickly see many examples of a new word or grammar point in action.
  • 'new card merged fill' feature allows you to study a parent deck and have it pull new cards from all children decks at once (whereas normally Anki will sequentially pull from the first deck until it's hit the daily limit, then move on the the next, etc). Also automatically skips cards whose "due" isn't above some configurable threshold.
  • Databases created from text files can have maturity assigned to the morphemes
  • Cards in "learning" status now how maturity assigned (set to 0.5 days) instead of 0, making sub-day "seen" and "known" thresholds more useful
  • Can save all.db, mature.db, known.db, and seen.db
  • Can configure optimal sentence length (defaults to 4 morphemes)
  • "Lite" mode which only modifies notes which are k+2 or better in order to reduce how much data must by synced
  • bugfixes and/or improved error messages for many issues

Known issues

  1. To optimize speed of successive calculations, MorphMan caches data aggressively which increases memory usage (~35 MB for a 9000 note collection). In the future there will be a config option to turn this off.

Troubleshooting

IOError: [Errno 32] Broken pipe

This error appears when mecab isn't acting properly, likely due to some configuration error. This is most common when the Japanese support addon isn't properly installed, but the addon found an improperly configured or incompatible version of mecab.

OSError: [Errno 2] そのようなファイルやディレクトリはありません

This error appears when the Japanese support addon isn't installed, or when the computer you're running doesn't have 32-bit binary support. If you're on a 64-bit machine first try installing libraries for 32-bit support.

SyntaxError: Non-ASCII character

This error should complain about a specific file, usually main.py. To fix it, add the line: "# -*- coding: utf-8 -*-" to the top of the file.

Feature requests

  1. Uses only base form but should offer the option to use inflected forms
  2. Tag cards that contain no kanji
  3. Export to various formats