Simple Japanese IME: hayanyuu

The IME is about done.

http://sprunge.us/QbgV

It will download edict2 automatically to $XDG_CONFIG_HOME/hayanyuu

It needs kakasi-cvs for UTF8 conversion of hiragana -> katakana.

It no longer needs mecab. Instead the usage is a little different:

Let’s say you want to type:

例大祭の読み方はレイタイサイ

You begin typing: “reitaisai” and it finds it in edict2 indicated by F3. You press it and it’s added to the “line so far”. Then you type “no”, F1 (hiragana). “yomikata” -> F3. “ha” -> F1. “reitaisai” -> F2 (katakana). “Enter” to echo the “line so far” to stdout.

Inflections are harder. Maybe in a future version I’ll find a better way to do it, but now, just type the base verb/adjective (as it appears in edict2) and add the inflection after it with hiragana. Then remove the redundant base inflections afterwards. E.g.:

言って

“iu” -> 言う. “tte” -> F1. Enter. Remove u

Lastly, here is another hint for xbindkeys:
"xterm -e 'hayanyuu >/tmp/hayanyuu_latest'; xdotool type "$(cat /tmp/hayanyuu_latest)""

So you don’t even have to paste it.

This entry was posted in edict2, Programming. Bookmark the permalink.

One Response to Simple Japanese IME: hayanyuu

  1. procyon says:

    The xdotool type approach is a bit dodgy.

    It often only pastes a few characters…

    One thing that should be added is the multi-radical search.

    The biggest hurdle is making a database of radical aliases.

    This would allow for, for instance:

    Desired: 挨拶 (aisatu)

    List of data
    2 kanji compound
    first kanji, radical hand, radical katakana-mu, radical arrow
    second kanji, radical hand, radical river, radical evening
    first kanji, on reading ai, kun reading hiraku
    second kanji, on reading satu, kun reading semaru

    So, then you could say something like:
    2-hand,arrow-hand

    2 kanji compound – hand and arrow in left – hand in right

    And it would list everything in edict that matches that.

    Which is just aisatu!

    I have already posted about this in the multi-radical search, but I’ll repeat the process again:

    radkfile lists kanji with 矢:
    [挨医勧歓潅観擬疑矯凝矩権侯候喉嫉疾族短知智痴蜘迭鉄薙矧矢俟嗾埃嶷椥欸猴癡矣矮礙竢笶篌簇翳聟肄蔟踟醫鏃雉]

    and 扎: (hand)
    [挨握扱按掩援押拐拡撹掛括換揮技擬掬拒拠挟掘掲携捲抗拘控拷挫採搾拶撮擦捌撒指持捨授拾抄招捷擾拭振推据摺誓逝拙接摂折撰措捜掃挿掻操捉揃損打托択拓担探抽挑捗掴抵挺提摘擢哲撤投搭撞捺捻撚把播拝排拍抜搬挽批披描扶撫払扮捕抱捧撲抹摸揖揚揺擁抑掠扎扞扣扛扠扨扼抂抉找抒抓抖拔抃抔拗拑抻拆擔拈拜拌拊拂拇抛拉挌拮拱挧挂拯拵捐挾捍搜捏掖掎掀掫捶掏掉掟掵捫捩掾揩揀揆揣揉插揶揄搖搆搓搦搶攝搗搨搏摧摶摎攪撕撓撥撩撈撼據擒擅擇撻擂擱擠擡抬擣擯攬擶擴擲擺擽攘攜攅攤攫晢浙湃箝箍籀]

    The first kanji needs to match both, which is where you use the set_intersection script:
    [挨擬]

    And then you grep for those two lists in edict with some boundaries:
    grep -E ‘(^|;)[挨擬][挨握扱按掩援押拐拡撹掛括換揮技擬掬拒拠挟掘掲携捲抗拘控拷挫 採搾 拶撮擦捌撒指持捨授拾抄招捷擾拭振推据摺誓逝拙接摂折撰措捜掃挿掻操捉揃損打托択拓担探 抽挑捗掴抵挺提摘擢哲撤投搭撞捺捻撚把播拝排拍抜搬挽批披描扶撫払扮捕抱捧撲抹摸揖揚揺擁抑 掠扎扞扣扛扠扨扼抂抉找抒抓抖拔抃抔拗拑抻拆擔拈拜拌拊拂拇抛拉挌拮拱挧挂拯拵捐挾捍搜捏掖 掎掀掫捶掏掉掟掵捫捩掾揩揀揆揣揉插揶揄搖搆搓搦搶攝搗搨搏摧摶摎攪撕撓撥撩撈撼據擒擅擇撻 擂擱擠擡抬擣擯攬擶擴擲擺擽攘攜攅攤攫晢浙湃箝箍籀]( |\(|;)’ edict2.utf

    Only hit:
    挨拶 [あいさつ] /(n,vs,adj-no) (1) greeting/greetings/salutation/salute/(2) speech (congratulatory or appreciative)/address/(3) reply/response/(4) (sl) revenge/retaliation/(exp) (5) (See 御挨拶) a fine thing to say (used as part of a sarcastic response to a rude remark)/(6) (orig. meaning) dialoging (with another Zen practitioner to ascertain their level of enlightenment)/(P)/EntL1151120X/

    But yeah, naming all the radicals will be a bit tricky. Maybe I should make some kind of selection like in:
    http://jisho.org/kanji/radicals/

    Especially if it can auto-shrink a list.

Comments are closed.