From: Marc Maurer (j.m.maurer@student.utwente.nl)
Date: Wed Jul 09 2003 - 07:44:08 EDT
Start of a dutch dictionary
CVS:
----------------------------------------------------------------------
CVS: Enter Log. Lines beginning with `CVS:' are removed automatically
CVS:
CVS: Committing in .
CVS:
CVS: Modified Files:
CVS: dic/Makefile.am
CVS: Added Files:
CVS: dic/nl.dic
CVS:
----------------------------------------------------------------------
Marc
Op wo 09-07-2003, om 12:55 schreef Nadav Rotem:
> Hi
>
> At the moment Open Text Summarizer can only summarize documents in
> english and hebrew. Enabling Abiword to summarize documents in your
> language is easy and fun! All you have to do is create a short text file
> that has about 200 special words in it.
>
>
> Here is how its done:
>
> Name your file (LangCode).dic (for example en.dic for english).
> In that file you need to put words that are common in your language but
> are NOT the subject of any article. For example the word "the" in
> english is very common but is not an "important" word;
> In other words , we can find the word "the" in almost every sentence and
> we can't tell anything about the sentence from it. Another example is
> the word "such" that is redundent (for this use).
> I know its a little strage but it works.
>
> Here is what I do. I take a UTF-8 text file (it has to be unicode) and
> ask OTS to tell me what words it thinks are key words in the article.
> here:
>
> ots letter.txt --dic=he --keywords | more
>
> where "he" is the "Hebrew" dictionary file and letter.txt is the text
> file.
>
> here is an example of such a file (in english this time)
> Word[15][to]
> Word[8][the]
> Word[6][a]
> Word[5][love]
> Word[5][Becky]
> Word[5][October]
> Word[5][north]
> ...
> ...
>
> As you can see the word "to" appears 15 times in the text. "To" is not a
> key-word so we need to place it in our dictionary file. The same goes
> for "the" and "a". Translating doc/en.dic would work for most germanic
> languages. Just play with it until you feel you get it right.
>
> for more info look into http://www.abisource.com/lxr/source/ots/README
>
> Other OTS related news:
> * OTS made it into Gentoo! to get OTS 0.2.0 under gentoo type "emerge
> ots";
-- Marc Maurer <j.m.maurer@student.utwente.nl>
This archive was generated by hypermail 2.1.4 : Wed Jul 09 2003 - 07:51:09 EDT