forked from mia/Aegisub
1813 lines
80 KiB
Text
1813 lines
80 KiB
Text
2011-02-16 Németh László <nemeth at numbertext dot org>:
|
|
* src/*/Makefile.am: fix library versioning, the probem reported by
|
|
Rene Engerhald and Simon Brouwer.
|
|
|
|
* man/hunspell.4: new version based on the revised version of Ruud Baars
|
|
|
|
2011-02-02 Németh László <nemeth at OOo>:
|
|
* suggestngr.cxx: fix ngram PHONE suggestion for input words with
|
|
diacritics using UTF-8 encoded dictionaries (add byte length to the
|
|
8-bit phonet() argument instead of character length)
|
|
|
|
* suggestmgr.cxx: fix missing csconv problem with UTF-8 encoding
|
|
dictionares, when the input contains non-BMP characters
|
|
- tests/utf8_nonbmp.sug: test file
|
|
|
|
* suggestmgr.cxx: mixed and keyboard based character suggestions
|
|
don't forbid ngram suggestion search (optimized tests/suggestiontest)
|
|
|
|
* affixmgr.cxx: fix hun#2999225: interfering compounding mechanisms,
|
|
tested on Dutch word list and reported by Ruud Baars
|
|
|
|
* affixmgr.cxx: allomorph fix for hun#2970240 (Hungarian
|
|
compound "vadász+gép" was analyzed as vad+ász+gép, and rejected
|
|
by the ss->s rep rule (verb "vadássz"), but the analysis
|
|
didn't continue for the longer word parts (vadász+gép).
|
|
|
|
* csutil.cxx: add lang code "az_AZ", "hu_HU", "tr_TR" for back
|
|
compatibility (fixing Azeri and Turkish casing conversion, also
|
|
Hungarian compound handling)
|
|
|
|
* affixmgr.cxx: fix morphological analysis
|
|
|
|
2011-01-26 Németh László <nemeth at OOo>:
|
|
* affixmgr.cxx: fix for moz#626195 (memcheck problem with FULLSTRIP).
|
|
|
|
* affixmgr.*, suggestmgr.cxx: FORBIDWARN parameter (see manual)
|
|
|
|
2011-01-24 Németh László <nemeth at OOo>:
|
|
* suffixmgr.cxx: fix bad suggestion of forbidden compound words, eg.
|
|
"termijndoel" with the Dutch dictionary. Reported by Ruud Baars.
|
|
|
|
* latexparser.cxx: fix double apostrophe TeX quoation mark tokenization
|
|
(hun#3119776), reported by Wybodekker at SF.net.
|
|
|
|
* tests/suggestiontest/*: multilanguage and single Hunspell version, see README
|
|
* tests/suggestiontest/prepare2: for make -f Makefile.orig single
|
|
|
|
2011-01-22 Németh László <nemeth at OOo>:
|
|
* affixmgr.*, suggestmgr.*: new features
|
|
ONLYMAXDIFF: remove all bad ngram suggestions (default mode keeps one)
|
|
NONGRAMSUGGEST: similar to NOSUGGEST, but it forbids to use the word
|
|
in ngram based (more, than 1-character distance) suggestions.
|
|
|
|
2011-01-21 Németh László <nemeth at OOo>:
|
|
* suggestmgr.*: limit wild suggestions (hun#2970237 by Ruud Baars)
|
|
- limited compound word suggestions
|
|
- improved and limited ngram based suggestions
|
|
* tests/*.sug: modified test files
|
|
- feature MAXCPDSUGS:
|
|
MAXCPDSUGS 0 : no compound suggestion, suggested by
|
|
Finn Gruwier Larsen in hunfeat#2836033
|
|
MAXCPDSUGS n : max. ~n compound suggestions
|
|
- feature MAXDIFF: differency limit for ngram suggestions: 0-10
|
|
eg. MAXDIFF 5: normal (default) limit
|
|
MAXDIFF 0: only one ngram suggestion
|
|
MAXDIFF 10: ~maxngramsugs ngram suggestions
|
|
|
|
* affixmgr.*, hunspell.*: add flag FORCEUCASE (hun#2999228), force
|
|
capitalization of compound words, see Hunspell 4 manual),
|
|
suggested by Ruud Baars
|
|
test/forceucase.*: test files
|
|
|
|
* affixmgr.*, hunspell.*: add flag WARN (hun#1808861), optional warning feature
|
|
for rare words, suggested by Ruud Baars
|
|
tests/warn: test files
|
|
* tools/hunspell.cxx: add option -r for optional filtering of rare words
|
|
|
|
* affixmgr.cxx: fix hun#3161359 (gcc warnings) reported by Ryan VanderMeulen.
|
|
|
|
2011-01-17 Németh László <nemeth at OOo>:
|
|
* suggestmgr.cxx: fix hun#3158994 and hun#3159027 (missing csconv table
|
|
using awkward 8bit capitalization of UTF-8 encoded dictionary words with PHONE
|
|
suggestion, reported by benjarobin and dicollecte at SF.net).
|
|
|
|
2011-01-13 Németh László <nemeth at OOo>:
|
|
* affixmgr.cxx: ONLYINCOMPOUND fix for hun#2999224 (fogemorphene
|
|
was allowed in end position of compoundings). Reported by Ruud Baars.
|
|
* tests/onlyincompound2.*: test files
|
|
|
|
2011-01-10 Ingo H. de Boer <idb_winshell at SF.net>:
|
|
* win_api/{hunspell,libhunspell, testparser}.vcproj: updated project
|
|
files for the library and the executables. Compiling problem
|
|
also reported by Don Walker.
|
|
|
|
2011-01-06 Németh László <nemeth at OOo>:
|
|
* affixmgr.cxx: fix freedesktop#32850 (program halt during Hungarian
|
|
spell checking of the word "6csillagocska6", reported by András Tímár)
|
|
|
|
* tools/hunspell.cxx: add Mac OS X Hunspell dictionary paths, asked by
|
|
Vidar Gundersen in hunfeat#3142010
|
|
|
|
2011-01-05 Caolán McNamara <cmc at OOo>:
|
|
* moz#620626 NS_UNICHARUTIL_CID doesn't support
|
|
case conversion
|
|
|
|
2011-01-03 Németh László <nemeth at OOo>:
|
|
* NEWS and THANKS: update for release 1.2.13
|
|
|
|
2010-12-20 Németh László <nemeth at OOo>:
|
|
* affixmgr.cxx: hun#3140784
|
|
|
|
2010-12-16 Németh László <nemeth at OOo>:
|
|
* affixmgr.cxx:
|
|
- improved fix of hun#2970242 (supporting
|
|
zero affixes, reported by Ruud Baars
|
|
- tests/opentaal_cpdpat{,2}: test files
|
|
|
|
- switching off default BREAK parameters by BREAK 0,
|
|
reported by Ruud Baars
|
|
|
|
- hun#2999225: interfering compounding mechanisms, reported by Ruud Baars
|
|
|
|
2010-12-11 Németh László <nemeth at OOo>:
|
|
* affixmgr.cxx: fix hun#2970242 (CHECKCOMPOUNDPATTERN only with flags),
|
|
the bug reported by Ruud Baars
|
|
* tests/2970242.*: test files
|
|
|
|
* tests/2970240.*: test files for CHECKCOMPOUNDPATTERN fix (check all
|
|
boundaries in compound words, fixed by the previous CHECKCOMPOUNDREP
|
|
fix), the bug reported by Ruud Baars
|
|
|
|
* win_api/Makefile.cygwin: update
|
|
|
|
2010-12-09 Caolán McNamara <cmc at OOo>:
|
|
* moz#617953 fix leak
|
|
|
|
2010-11-08 Caolán McNamara <cmc at OOo>:
|
|
* rhbz#650503 crash in arabic dictionary
|
|
|
|
2010-11-05 Caolán McNamara <cmc at OOo>:
|
|
* rhbz#648740 don't warn on empty flagvector
|
|
|
|
2010-11-03 Caolán McNamara <cmc at OOo>:
|
|
* logically we shouldn't need a csconv table in utf-8 mode
|
|
|
|
2010-10-27 Németh László <nemeth at OOo>:
|
|
* hun#3000055 (requested by Ruud Baars) add REP boundary specifiation:
|
|
REP ^word$ xxxx
|
|
REP ^wordstarting xxxx
|
|
REP wordending$ xxxx
|
|
|
|
* hun#3008434 (requested by Adrián Chaves Fernández) and
|
|
hun#3018929 (requested by Ruud Baars): REP with more than 2 words:
|
|
REP morethantwo more_than_two
|
|
|
|
* suggestmgr.cxx: fix incomplete suggestion list for capitalized words,
|
|
eg. missing Machtstrijd->Machtsstrijd in the Dutch dictionary
|
|
(reported by Ruud Bars)
|
|
|
|
* tests, man: related updates
|
|
|
|
2010-10-12 Caolán McNamara <cmc at OOo>:
|
|
* moz#603311 HashMgr::load_tables leaks dict when decode_flags fails
|
|
* fix mem leak found with new tests
|
|
* hun#3084340 allow underscores in html entity names
|
|
|
|
2010-10-07 Németh László <nemeth at OOo>:
|
|
* affixmgr.cxx:
|
|
- hun#2970239 fix bad suggestion of forbidden compound words
|
|
- hun#2999224 fix keepcase feature on compound words (only partial
|
|
fix for COMPOUNDRULE based compounding)
|
|
- fix checkcompoundrep feature in compound words (check all boundaries,
|
|
not only the last one)
|
|
Problems reported by Ruud Baars.
|
|
|
|
* tests/opentaal_forbiddenword[12]*, tests/opentaal_keepcase*:
|
|
new test files for the previous fixes
|
|
* tests/checkcompoundrep: extended test file.
|
|
|
|
2010-09-05 Caolán McNamara <cmc at OOo>:
|
|
* moz#583582 fix double buffer gcc fortify issue
|
|
|
|
2010-08-13 Caolán McNamara <cmc at OOo>:
|
|
* moz#586671 AffixMgr::parse_convtable leaks pattern/pattern2 if it
|
|
can't create both
|
|
* moz#586686 tidy up get_xml_list and friends
|
|
|
|
2010-08-10 Caolán McNamara <cmc at OOo>:
|
|
* hun#3022860 fix remove duplicate code
|
|
|
|
2010-07-17 Caolán McNamara <cmc at OOo>:
|
|
* remove ununsed get_default_enc and avoid potential misrecognition of
|
|
three letter language ids
|
|
* normalize encoding names before lookup
|
|
|
|
2010-07-05 Caolán McNamara <cmc at OOo>:
|
|
* hun#2286060 add Hangul syllables to unicode tables
|
|
|
|
2010-06-26 Caolán McNamara <cmc at OOo>:
|
|
* moz#571728 keep new[]/delete[] wrappers in sync for embedded in moz
|
|
case
|
|
|
|
2010-06-13 Caolán McNamara <cmc at OOo>:
|
|
* moz#571728 keep new[]/delete[] wrappers in sync for embedded in moz
|
|
case
|
|
|
|
2010-06-02 Caolán McNamara <cmc at OOo>:
|
|
* moz#569611 compile cleanly under win64
|
|
|
|
2010-05-22 Caolán McNamara <cmc at OOo>:
|
|
* moz#525581 apply mozilla's current preferred get_current_cs impl
|
|
|
|
2010-05-17 Németh László <nemeth at OOo>:
|
|
* affixmgr.cxx: fix bad limitation of parenthesized flags at
|
|
COMPOUNDRULEs. Windows crash reported by Ruud Baars and Simon Brouwer.
|
|
|
|
2010-05-05 Caolán McNamara <cmc at OOo>:
|
|
* rhbz#589326 malloc of int that should have been of char**
|
|
* hun#2997388 fix ironic misspellings
|
|
|
|
2010-04-28 Caolán McNamara <cmc at OOo>:
|
|
* moz#550942 get_xml_list doesn't handle failure from get_xml_par
|
|
|
|
2010-04-27 Caolán McNamara <cmc at OOo>:
|
|
* moz#465612 mozilla-specific code leaks
|
|
* moz#430900 phone is dereferenced before oom check
|
|
* moz#418348 ckey_utf alloc is used unchecked in SuggestMgr::badcharkey_utf
|
|
* CID#1487 pointer "rl" dereferenced before NULL check
|
|
* CID#1464 Returned without freeing storage "ptr"
|
|
* CID#1459 Avoid duplicate strchr
|
|
* CID#1443 Avoid any chance of dereferencing *slst
|
|
* CID#1442 Unsafe to have a null morph
|
|
* CID#1440 Avoid null filenames
|
|
* CID#1302 Dereferencing NULL value "apostrophe"
|
|
* CID#1441 Avoid deferencing null ppfx
|
|
|
|
2010-04-16 Caolán McNamara <cmc at OOo>:
|
|
* hun#2344123 fix U)ncap in utf-8 locale
|
|
* fix up hunspell text UI and lines wider than terminal
|
|
|
|
2010-04-15 Caolán McNamara <cmc at OOo>:
|
|
* hun#2613701 fix small leak in FileMgr::FileMgr
|
|
* fix small leak in tools/hunspell
|
|
* hun#2871300 avoid crash if def and words are NULL
|
|
* hun#2904479 fix length of hzip file
|
|
* hun#2986756 mingw build fix
|
|
* hun#2986756 fix double-free
|
|
* hun#2059896 fix crash in interactive mode without nls
|
|
* hun#2917914 add some extra words to the latexparser
|
|
* make some structs static
|
|
* C-api has duped symbol names
|
|
* regenerate gettext/intl with recent version
|
|
* hun#2796772 build a .dll under MinGW
|
|
* rhbz#502387 allow cross-compiling for MinGW target
|
|
* hun#2467643 update .vcproj files to include replist.?xx
|
|
* unify visiblity/dll_export support across platforms
|
|
* hun#2831289 sizeof(short) typo
|
|
* hun#2986756 add -u3 gcc style output
|
|
|
|
2010-04-14 Caolán McNamara <cmc at OOo>:
|
|
* hun#2813804 fix segfault on hu_HU stemming
|
|
|
|
2010-04-13 Caolán McNamara <cmc at OOo>:
|
|
* hun#2806689 fix ironic misspellings
|
|
* hun#2836240 add Italian translations
|
|
|
|
2010-04-09 Caolán McNamara <cmc at OOo>:
|
|
* fix titchy possible leak in command-line spellchecker
|
|
|
|
2010-04-07 Caolán McNamara <cmc at OOo>:
|
|
* hun#2973827 apply win64 patch
|
|
* hun#2005643 fix broken mystrdup
|
|
|
|
2010-03-04 Caolán McNamara <cmc at OOo>:
|
|
* ooo#107768 fix crash in long strings in spellml mode
|
|
* hun#1999737 add some malloc checks
|
|
* hun#1999769 drop old buffer on realloc failure
|
|
* hun#2005643 tidy string functions
|
|
* hun#2005643 micro-opt
|
|
* hun#2006077 free strings on failed dict parse
|
|
* hun#2110783 ispell-alike verbose mode implementation
|
|
|
|
2010-03-03 Németh László <nemeth at OOo>:
|
|
* hunspell/(affixmgr, suggestmgr).cxx: add character sequence
|
|
support for MAP suggestion, using parenthesized character groups
|
|
in the syntax, eg. MAP ß(ss).
|
|
* man/hunspell.4, tests/map*: documentation and test files
|
|
|
|
2010-02-25 Németh László <nemeth at OOo>:
|
|
* hunspell/hunspell.cxx: add recursion limit for BREAK (fix OOo Issue 106267)
|
|
|
|
* hunspell/hunspell.cxx: fix crash in morphological analysis of
|
|
capitalized words with ending dashes
|
|
|
|
* affixmgr.cxx: fix morphological analysis of long numbers combined with dash,
|
|
eg. 45-00000045 (reported by a@freeblog.hu).
|
|
|
|
2010-02-23 Caolán McNamara <cmc at OOo>:
|
|
* hun#2314461 improve ispell-alike mode
|
|
* hun#2784983 improve default language detection
|
|
* hun#2812045 fix some compiler warnings
|
|
* hun#2910695 survive missing HOME dir
|
|
* hun#2934195 fix suggestmgr crash
|
|
* hun#2921129 remove unused variables
|
|
* hun#2826164 make sure make check uses the in-tree libhunspell
|
|
* bump toolchain to support --disable-rpath
|
|
* hun#2843984 fix coverity warning
|
|
* hun#2843986 fix coverity warning
|
|
* hun#2077630 add iconv lib
|
|
* make gcc strict-aliasing warning free
|
|
* make cppcheck warning free
|
|
|
|
2008-11-01 Németh László <nemeth at OOo>:
|
|
* replist.*, hunspell.cxx, affixmgr.cxx: new input and output
|
|
conversion support, see ICONV and OCONV keywords in the Hunspell(4)
|
|
manual page and the test examples. The input/output conversion
|
|
problem of syllabic languages reported by Daniel Yacob and
|
|
Shewangizaw Gulilat.
|
|
- tests/{iconv,oconv}.*: test examples
|
|
|
|
* tools/wordforms: word generation script for dictionary developers
|
|
(Hunspell version of the unmunch program)
|
|
|
|
* hunspell/hunspell.cxx: extended BREAK feature: ^ and $ mean in break
|
|
patterns the beginning and end of the word.
|
|
- tests/BREAK.*: modified examples.
|
|
|
|
* hunspell/hunspell.cxx: set default break at hyphen characters.
|
|
The associated problem reported by S Page in Hunspell Bug 2174061.
|
|
See Mozilla Bug ID 355178 and OOo Issue 64400, too.
|
|
- tests/breakdefault.*: test data
|
|
The following definition is equivalent of the default word break:
|
|
|
|
BREAK 3
|
|
BREAK -
|
|
BREAK ^-
|
|
BREAK -$
|
|
|
|
* affixmgr.cxx: SIMPLIFIEDTRIPLE is a new affix file keyword to allow
|
|
simplified forms of the compound words with triple repeating letters.
|
|
It is useful for Swedish and Norwegian languages.
|
|
|
|
* affixmgr.cxx: extend CHECKCOMPOUNDPATTERN to support
|
|
alternations of compound words for example by sandhi
|
|
feature of Indian and other languages. The problem reported
|
|
by Kiran Chittella associated with Telugu writing system
|
|
(see Telugu example in tests/checkcompoundpattern4.test).
|
|
The new optional field of CHECKCOMPOUNDPATTERN definition is the
|
|
replacement of the compound boundary defined by the previous fields:
|
|
CHECKCOMPOUNDPATTERN ff f ff
|
|
means ff|f compound boundary has been replaced by "ff", like in
|
|
the (prereform) German Schiffahrt (Schiff+fahrt).
|
|
- CHECKCOMPOUNDPATTERN supports also optional flag conditions now:
|
|
CHECKCOMPOUNDPATTERN ff/A f/B ff
|
|
means that the first word of the compound needs flag "A" and
|
|
the second word of the compound needs flag "B" to the operation.
|
|
|
|
* tools/hunspell.cxx: add empty lines as separators to the output of
|
|
the stemming and morphological analysis.
|
|
|
|
* affixmgr.cxx: fix condition checking algorithm. Bad suggestion
|
|
generation reported by Mehmet Akin in SF.net Bug 2124186 with help of
|
|
Eleonora Goldman.
|
|
|
|
* affixmgr,cxx: fix COMPOUNDWORDMAX feature. The problem and its
|
|
code details reported by Göran Andersson under SF.net Bug ID 2138001.
|
|
|
|
* csutil.cxx: fix bad conditional code for Mozilla compilation.
|
|
Patch by Serge Gautherie. The problem reported by Ryan VanderMeulen.
|
|
|
|
* hunspell/hunspell.cxx: add missing ngram suggestion for HUHINITCAP
|
|
(capitalized mixed case) words.
|
|
|
|
* w_char.hxx: use GCC conditions for GCC related code. Patch by
|
|
Ryan VanderMeulen.
|
|
|
|
* affixmgr.cxx: check morphological description in morphgen()
|
|
(fix potential program fault by incomplete morphological
|
|
description of affix rules)
|
|
|
|
* src/win_api: config.h: switch on warning messages on Windows
|
|
|
|
* tools/affixcompress: extended help for -h (use LC_ALL=C sort
|
|
for input word list)
|
|
|
|
* man/hunspell.4: updated manual:
|
|
- new and modified features (SIMPLIFIEDTRIPLE, ICONV, OCONV,
|
|
BREAK, CHECKCOMPOUNDPATTERN).
|
|
- note about costs of zero affixes, suggested by Olivier Ronez.
|
|
|
|
* hunspell/hunspell.cxx: remove deprecated word breaking codes.
|
|
|
|
2008-08-15 Németh László <nemeth at OOo>:
|
|
* affentry.cxx: add FULLSTRIP option. With FULLSTRIP, affix rules can
|
|
strip full words, not only one less characters. Suggested by
|
|
Davide Prina and other developers in OOo Issue 80145.
|
|
* tests/fullstrip.*: Test data based on Davide Prina's example.
|
|
* tools/unmunch.cxx: modified for FULLSTRIP.
|
|
|
|
* affixmgr.cxx: COMPOUNDRULE now works with long and numerical flag
|
|
types by parenthesized flags. Syntax: (flag)*, (flag)(flag)?(flag)*.
|
|
* tests/compoundrule[78].*: tests with parenthesized COMPOUNDRULE
|
|
definitions.
|
|
|
|
* suggestmgr.cxx: modified badchar*(), forgotchar*() and extrachar*()
|
|
1-character distance suggestion algorithms: search a TRY character
|
|
in all position instead of all TRY characters in a character position
|
|
(it can give more readable suggestion order, also better suggestions
|
|
in the first positions, when TRY characters are sorted by frequency.)
|
|
For example, suggestions for "moze":
|
|
ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6),
|
|
maze, more, mote, ooze, mole etc. (Hunspell 1.2.7).
|
|
|
|
* suggestmgr.cxx: extended compound word checking for better COMPOUNDRULE
|
|
related suggestions, for example English ordinal numbers: 121323th ->
|
|
121323rd (it needs also a th->rd REP definition).
|
|
|
|
* phonet.cxx: cast unsigned char parameter of isdigit() and fix
|
|
isalpha by myisalpha() (potential problems in Windows environment).
|
|
Reported by Thomas Lange in OOo Issue 92736.
|
|
|
|
* hunspell/csutil.*,hunspell/{affentry,affixmgr,hunspell,suggestmgr}.cxx:
|
|
fix potential buffer overloading under morphological analysis by the
|
|
new mystrcat() function. Reported by Molnár Andor (dolhpy at true
|
|
dot hu) in SF.net Bug 2026203.
|
|
|
|
* affixmgr.cxx: add recursion limit to defcpd(). Fix OOo Issue 76067:
|
|
crash-like deceleration by checking hexadecimal numbers with long FFF
|
|
sequence (combinatory explosion by the en_US words "f" and "ff").
|
|
Missing fix reported by Mathias Bauer.
|
|
|
|
* affixmgr.cxx: fix the difference in the Unicode and non-Unicode
|
|
parts of cpdcase_check(). Bug report by Brett Wilson.
|
|
|
|
* filemgr.*, affixmgr.cxx, csutil.*, hashmgr.*: warning messages now
|
|
contain line numbers (use --with-warnings configure option for
|
|
warning messages).
|
|
|
|
* hunspell.cxx: analyze(): fix case conversion of stemming and
|
|
morphological analysis of UTF-8 encoded input. Reported by Ferenc Godó.
|
|
|
|
* tools/hunspell.cxx: fix LaTeX Unicode support in filter mode.
|
|
Reported by Jan Seeger in SF.net Bug 2039990.
|
|
|
|
* affixmgr.hxx: 0.5 or in 64 bit environment, 1 MB (virtual) memory
|
|
saving using only the requested size for sFlag and pFlag arrays.
|
|
Bug report by Brett Wilson.
|
|
|
|
* affixmgr.cxx,tools/hunspell.cxx: get_version() returns with full
|
|
VERSION affix parameter instead of its first word. Fixes for
|
|
Hunspell's header. Some problems with Hunspell header reported in
|
|
SF.net Bug 2043080.
|
|
|
|
2008-07-15 Németh László <nemeth at OOo>:
|
|
* affentry.cxx: fixes of the affix rule matching algorithm (affected
|
|
only the sk_SK dictionary from all OpenOffice.org dictionaries):
|
|
- fix dot pattern + accented letters matching (in non Unicode encoding)
|
|
- word-length conditions work again
|
|
* tests/condition.*: extended test for the fix.
|
|
|
|
* hashmgr.cxx: load multiword expressions: spaces may be parts
|
|
of the dictionary words again (but spaces also work as morphological
|
|
field separators: word word2 -> "word word2", word po:noun -> "word").
|
|
* man/hunspell.4: updated manual
|
|
|
|
* tools/hunspell.cxx: add iconv character conversion support to
|
|
stemming and morphological analysis
|
|
|
|
* tools/hunspell.cxx: add /usr/share/myspell/dicts search path for
|
|
Ubuntu support
|
|
|
|
2008-07-09 Németh László <nemeth at OOo>:
|
|
* affentry.cxx: fixes of the affix rule matching algorithm:
|
|
- right ASCII character handling in bracket expression;
|
|
- fault-tolerant nextchar() for bad rules.
|
|
Problem with the en_GB dictionary and nextchar() with a detailed
|
|
code analysis reported by John Winters in SF.net Bug ID 2012753.
|
|
* tests/condition.*: extended test for the fix.
|
|
|
|
* hunspell/hunspell.*, parsers/*, tools/hunspell.cxx: fix compiler
|
|
warnings (deprecated const-free char consts)
|
|
|
|
* win_api/hunspelldll.*: add hunspell_free_list(), the problem
|
|
reported by Laurier Mercer.
|
|
|
|
2008-06-30 Török László <torok_laszlo at users dot SF dot net>:
|
|
* tests/affixmgr.cxx: fix morphological analysis: strcat() on
|
|
an uninitialized char array in suffix_check_morph().
|
|
|
|
2008-06-18 Németh László <nemeth at OOo>:
|
|
* src/hunspell/affixmgr.cxx: fix GCC compiler warnings
|
|
(comparisons with string literal results in unspecified behaviour).
|
|
The problem reported by Ladislav Michnovič.
|
|
|
|
2008-06-17 Németh László <nemeth at OOo>:
|
|
* src/hunspell/{hunspell.cxx,hunspell.h}: add free_list() to the C and
|
|
C++ interface to deallocate suggestion lists. The problem
|
|
reported by Laurie Mercer and Christophe Paris.
|
|
* csutil.cxx: fix freelist() to deallocate non-NULL list, when n = 0.
|
|
* tools/{analyze,example,chmorph,hunspell}.cxx: use free_list().
|
|
|
|
* tools/hunspell.cxx: fix only --with-readline compiling problem.
|
|
Reported by Volkov Peter in SF.net Bug 1995842.
|
|
|
|
* man/hunspell.3,hunspell.hxx: fix analyze and generate examples in
|
|
the manual and comments (using char*** parameter instead of char**).
|
|
|
|
* tools/example.cxx: fix suggestion example.
|
|
|
|
2008-06-17 Németh László <nemeth at OOo>:
|
|
* affentry.cxx: fix the new affix rule matching algorithm of
|
|
Hunspell 1.2. Arabic dictionary problem reported by Khaled Hosny
|
|
in SF.net Bug ID 1975530. Mohamed Kebdani also sent a
|
|
prepared test data.
|
|
* tests/{1975530,condition*}: tests for the fix
|
|
|
|
2008-06-13 Ingo H. de Boer <idb_winshell at SF.net>:
|
|
* src/hunspell/{affixmgr.cxx,hunspell.cxx}: add missing type
|
|
cast to strstr() calls for VC8 compatibility.
|
|
|
|
2008-06-13 Németh László <nemeth at OOo>:
|
|
* suggestmgr.cxx: add also part1-part2 suggestion with dash
|
|
for bad part1part2 word forms, suggested by Ruud Baars.
|
|
For example, now suggestion of "parttime": "part time"
|
|
and "part-time".
|
|
NOTE: this feature will work only when the TRY definition
|
|
contains "-" or the letter "a".
|
|
|
|
* hunspell.cxx: new XML API in spell() and suggest() (see hunspell(3)).
|
|
|
|
* src/hunspell/*: fixes for OpenOffice.org build environment.
|
|
|
|
* man/{hunspell.3,hzip.1,hunzip.1}: add new manual pages for
|
|
Hunspell programming API and dictionary compression and
|
|
encryption utilities.
|
|
|
|
* src/hunspell/*: handle failed mystrdup() calls and other potential
|
|
insufficient memory problems. The problem reported by Elio Voci
|
|
in OpenOffice.org Issue 90604 and others.
|
|
|
|
* src/tools/affixmgr.cxx: restore original behaviour of get_wordchars
|
|
without conditional code. Problem reported by Ingo H. de Boer
|
|
in SF.net Bug 1763105.
|
|
|
|
* win_api/hunspelldll.h: put_word() renamed to add() in the (old)
|
|
Windows DLL API bug reported in SF.net Bug 1943236. Also reported
|
|
by Bartkó Zoltán.
|
|
|
|
* tools/hunspell.cxx: fix chench() for environments without
|
|
native language support (ENABLE_NLS 0 in config.h),
|
|
PHP system_exec() bug reported by Michel Weimerskirch in
|
|
SF.net Bug 1951087.
|
|
|
|
* hunspell.cxx, affixmgr.cxx: remove "result" from the
|
|
(result && *result) conditions, when "result" is a static variable.
|
|
The problem and a possible solution reported by Ladislav Michnovič.
|
|
|
|
* affixmgr.cxx: parse_affix(): print line instead of NULL in
|
|
the warning message, when affix class header is bad.
|
|
The problem reported by Ladislav Michnovič.
|
|
|
|
2008-06-01 Christian Lohmaier <cloph at OOo>
|
|
* configure.ac: patch to fix --with-readline, --with-ui logic.
|
|
Reported in the SF.net Bug 981395.
|
|
|
|
2008-05-04: Volkov Peter <volkov_peter at users sourceforge net>
|
|
* configure.ac: fix LibTool 2.22 incompatibility by removing
|
|
unused LT_* macros. Report and patch in SF.net Bug 1957383.
|
|
The problem reported and fixed by Ladislav Michnovič, too.
|
|
|
|
2008-04-23: Ladislav Michnovič <lmichnovic at suse cz>
|
|
* hunspell.pc.in: fix wrongly set directories.
|
|
|
|
2008-04-12 Németh László <nemeth at OOo>:
|
|
* src/tools/hunspell.cxx:
|
|
- Multilingual spell checking and special dictionary support with -d.
|
|
Multilingual spell checking suggested by Khaled Hosny (SF.net
|
|
Bug 1834280). Example for the new syntax:
|
|
|
|
-d en_US,en_geo,en_med,de_DE,de_med
|
|
|
|
en_US and de_DE are base dictionaries, and en_geo, en_med, de_med
|
|
are special dictionaries (dictionaries without affix file).
|
|
Special dictionaries are optional extension of the base dictionaries.
|
|
There is no explicit naming convention for special dictionaries,
|
|
only the ".dic" extension: dictionaries without affix file will
|
|
be an extension of the preceding base dictionary. First dictionary
|
|
in -d parameter must have an affix file (it must be a base
|
|
dictionary).
|
|
|
|
- new options for debugging, morphological analysis and stemming:
|
|
-m: morphological analysis or flag debug mode (without affix
|
|
rule data it signs the flag of the affix rules)
|
|
-s: stemming mode
|
|
-D: show also available dictionaries and search path
|
|
(suggested by Aaron Digulla in SF.net Bug 1902133)
|
|
|
|
- add missing refresh() to print bad words before the slower suggestion
|
|
search in UI (better user experience)
|
|
|
|
- fix tabulator problems (reported by ugli-kid-joe AT sf DOT net)
|
|
|
|
- fix different encoding of dic and input, and suggestions
|
|
|
|
- add per mille sign to LANG hu_HU section.
|
|
|
|
- rewrite program messages. Concatenating multiple printfs for
|
|
easier translation suggested by András Tímár and Gábor Kelemen.
|
|
|
|
* src/hunspell/csutil.cxx: set static encds variable. Patch by
|
|
Rene Engerhald. SF.net Bug 1896207 and 1939988.
|
|
|
|
* src/hunspell/w_char.hxx,csutil.hxx: reorganizing
|
|
w_char typedef and HENTRY_DATA, HENTRY_FIND consts
|
|
|
|
* src/hunspell/hunzip.cxx: fopen(): using rb options instead of r (fix
|
|
for Windows)
|
|
|
|
* src/tools/affixmgr.cxx: restore original behaviour of get_wordchars
|
|
in an #ifdef WINSHELL section. Problem reported by Ingo H. de Boer
|
|
in SF.net Bug 1763105.
|
|
|
|
* src/tools/chmorph.cxx: remove the experimental modifications
|
|
|
|
* src/tools/hzip.c: fopen(): using wb options instead of w (fix
|
|
for Windows)
|
|
|
|
* src/tools/hunzip.cxx: add missing MOZILLA_CLIENT. Reported
|
|
by Ryan VanderMeulen.
|
|
|
|
* man/*, man/hu/*: updated manual
|
|
|
|
* man/hunspell.4: fix formatting problem (missing header)
|
|
|
|
* tools/makealias: now works with the extra data fields.
|
|
|
|
* phonet.cxx: use HASHSIZE const
|
|
|
|
* tests/rep.aff: fix REP count
|
|
|
|
* src/win_api/Makefile.cygwin, README: native Windows compilation
|
|
in Cygwin environment without cygwin1.dll dependency (see README
|
|
for compiling instructions).
|
|
|
|
2008-04-08 Roland Smith <rsmith AT xs4all DOT nl>:
|
|
* src/parsers/latexparser.cxx: fix PATTERN_LEN for AMD64 and
|
|
other platforms with different struct padding (SF.net Bug 1937995).
|
|
|
|
2008-04-03 Kelemen Gábor <kelemeng AT gnome DOT hu>:
|
|
* po/POTFILES.in: fix path of the source file
|
|
|
|
* po/Makevars: add --from-code=UTF-8 gettext option
|
|
|
|
* hunspell.cxx: add comments for shortkey translation
|
|
|
|
2008-02-04 Flemming Frandsen <flfr AT stibo DOT com>
|
|
* src/hunspell.h: fix Windows DLL support
|
|
- this patch also reported by Zoltán Bartkó.
|
|
|
|
2008-01-30 Mark McClain <marc_mcclain AT users DOT sf DOT net>
|
|
* src/hunspell.cxx: stem(): fix function call side effect
|
|
for PPC platform (SF.net Bug 1882105).
|
|
|
|
2008-01-30 Németh László <nemeth at OOo>:
|
|
* hunspell.cxx, csutil.cxx, hunspelldll.c: fix
|
|
SF.et Bug 1851246, patch also by Ingo H. de Boer.
|
|
|
|
* hunspell.h: fix SF.net Bug 1856572 (C prototype problem),
|
|
patch by Mark de Does.
|
|
|
|
* hunspell.pc.in: fix SF.net Bug 1857450 wrong prefix, reported
|
|
by Mark de Does.
|
|
|
|
* hunspell.pc.in: reset numbering scheme: libhunspell-1.2.
|
|
Fix SF.net Bug 1857512 reported by Mark de Does,
|
|
also by Rene Engelhard.
|
|
|
|
* csutil.cxx: patches for ARM platform, signed_chars.dpatch
|
|
by Rene Engelhard and arm_structure_alignment.dpatch by
|
|
Steinar H. Gunderson <sesse@debian.org>
|
|
|
|
* hunzip.*, hzip.c: new hzip compression format
|
|
|
|
* tools/affixcompressor: affix compressor utility (similar to
|
|
munch, but it generates affix table automatically), works
|
|
with million-words dictionaries of agglutinative languages.
|
|
|
|
* README: fix problems reported by Pham Ngoc Khanh.
|
|
|
|
* csutil.cxx, suggestmgr: Warning-free in OOo builds.
|
|
|
|
* hashmgr.*, csutil.*: fix protected memory problems with
|
|
stored pointers on several not x86 platforms by
|
|
store_pointer(), get_stored_pointer().
|
|
|
|
* src/tools/hunspell.cxx: fix iconv support on Solaris platform.
|
|
|
|
* tests/IJ.good: add missing test file
|
|
|
|
* csutil.cxx: fix const char* related errors. Compiling bug
|
|
with Visual C++ reported by Ryan VanderMeulen and Ingo H. de Boer.
|
|
|
|
2008-01-03 Caolan McNamara <cmc at OO.o>:
|
|
* csutil.cxx: SF.net Bug 1863239, notrailingcomma patch and
|
|
optimization of get_currect_cs().
|
|
|
|
2007-11-01 Németh László <nemeth at OOo>:
|
|
* hunspell/*: new feature: morphological generation,
|
|
also fix experimental morphological analysis and stemming.
|
|
- new API functions and improved API:
|
|
- analyze(word): (instead of morph()) morphological analysis
|
|
- stem(word): stemming
|
|
- stem(list): stemming based on the result of an analysis
|
|
- generate(word, word2): morphological generation
|
|
- generate(word, list): morphological generation
|
|
- add(word): add word to the run-time dictionary (renamed put_word())
|
|
- add_with_affix(word, word2): (renamed put_word_pattern()):
|
|
add word to the run-time dictionary with affix flags of the
|
|
second parameter: all affixed forms of the user words will be
|
|
recognised by the spell checker. Especially useful for
|
|
agglutinative languages.
|
|
- remove(word): remove word from the run-time dictionary (not
|
|
implemented)
|
|
- see manual and hunspell/hunspell.hxx header and tests/morph.*
|
|
* tests/morph.*: test data, example for morphological analysis,
|
|
stemming and generation
|
|
|
|
* tools/analyze, tools/chmorph: extended and new demo applications:
|
|
- analyze (originally hunmorph): analyses and stems input words,
|
|
generates word forms from input word pairs.
|
|
- chmorph: morphological transformation filter
|
|
|
|
* configure.ac, hunspell/makefile.am: set library version number.
|
|
Bug reported by Rene Engelhard.
|
|
|
|
* affentry.cxx, affixmgr.cxx: new pattern matching algorithm in
|
|
condition checking of affix rules instead of the Dömölki-algorithm:
|
|
- Unlimited condition length (instead of max. 8 characters).
|
|
- Less memory consumption, especially useful for affix rich languages:
|
|
5,4 MB memory savings with hu_HU dictionary.
|
|
- Speed change depends from dictionaries and CPU caches: English spell
|
|
checking is 4% faster on Linux words with en_US dictionary, Hungarian
|
|
spell checking is 25% slower on most frequent words of Hungarian
|
|
Webcorpus.
|
|
|
|
* tests/sug.*, sugutf.*: updated test data (use "a" and "lot"
|
|
dictionary items instead of "a lot".)
|
|
|
|
* src/hunspell/hunspell.cxx: free(csconv) instead of delete csconv.
|
|
Report and patch by Sylvain Paschein in Mozilla Issue 398268.
|
|
|
|
* suggestmgr.cxx, tools/hunspell.cxx: bad spelling of "misspelled".
|
|
Ubuntu Bug #134792, patch by Malcolm Parsons.
|
|
|
|
* tests/base_utf.*: use Unicode apostrophe instead of 8-bit one.
|
|
|
|
* hunspell.cxx, hashmgr.cxx: add(): use HashMgr::add()
|
|
|
|
2007-10-25 Pavel Janík <pjanik at OOo>:
|
|
* hunspell/csutil.cxx: Fix type cast warnings on 64bit Linux in
|
|
printing of character positions in u8_u16(). OOo issue 82984.
|
|
|
|
2007-09-05 Németh László <nemeth at OOo>:
|
|
* win_api/Hunspell.vproj, parsers/testparser.cxx,textparser.hxx:
|
|
warning fixes and removing unnecessary Windows project file.
|
|
Reported by Ingo H. de Boer.
|
|
|
|
* hashmgr.*, {affixmgr,suggestmgr}.cxx: optimized data structure
|
|
for variable-count fields (only "ph" transliteration field in
|
|
this version, see next item). Also less memory consumption:
|
|
-13% (0.75 MB) with en_US dictionary, -6% (1 MB) with hu_HU.
|
|
|
|
* suggestmgr.cxx: dictionary based phonetic suggestion for special
|
|
or foreign pronounciation (see also rule-based PHONE in manual).
|
|
Usage: tab separated field in dictionary lines, started with "ph:".
|
|
The field contains a phonetic transliteration of the word:
|
|
|
|
Marseille ph:maarsayl
|
|
* tests/phone.*: test data for dictionary and rule based phonetic
|
|
suggestion.
|
|
|
|
* hunspell.cxx: fix potential bad memory access in allcap word
|
|
capitalization in suggest() (bug of previous version).
|
|
|
|
* hunspell.cxx, atypes.hxx: set correct limit for UTF-8 encoded
|
|
input words (256 byte).
|
|
|
|
* suggestmgr.cxx: improved REP suggestions with spaces: it works
|
|
without dictionary modification.
|
|
OOo issue 80147, reported by Davide Prina.
|
|
* tests/rep.*: new test data: higher priority for "alot" -> "a lot",
|
|
and Italian suggestion "un'alunno" -> "un alunno".
|
|
|
|
* affixmgr.cxx: fix Unicode ngram suggestions in expand_rootword().
|
|
(Suggestions with bad affixes.)
|
|
Bug reported by Vitaly Piryatinksy <piv dot v dot vitaly at gmail>.
|
|
* tests/ngram_utf_fix.*: test based on Vitaly Piryatinksy's data.
|
|
|
|
* suggestmgr.cxx: fix twowords() for last UTF-8 multibyte character.
|
|
(conditional jump or move depended on uninitialised value).
|
|
|
|
2007-08-29 Ingo H. de Boer <idb_winshell at SF.net>:
|
|
* win_api/{hunspell,libhunspell, testparser}.vcproj: new project
|
|
files for the library and the executables.
|
|
|
|
* Hunspell.rc, Hunspell.sln, config.h: updated versions.
|
|
Version number problem also reported by András Tímár.
|
|
|
|
2007-08-27 Németh László <nemeth at OOo>:
|
|
* suggestmgr.hxx: put fixed version. Bug report by Ingo H. de Boer.
|
|
|
|
* suggestmgr.cxx: remove variable-length local character array
|
|
reported by Ingo H. de Boer.
|
|
|
|
2007-08-27 Németh László <nemeth at OOo>:
|
|
* suggestmgr.hxx: change bad time_t to clock_t in header, too.
|
|
Bug reports or patches by Ingo H. de Boer under SF.net
|
|
Bug ID 1781951, János Mohácsi and Gábor Zahemszky, András Tímár,
|
|
OMax3 at SF.net under SF.net Bug ID 1781592.
|
|
|
|
* phonet.*: change variable-length local character array to
|
|
portable fixed size character array. Problem reported by
|
|
Ingo H. de Boer under SF.net Bug ID 1781951 and
|
|
Ryan VanderMeulen.
|
|
|
|
* suggestmgr.cxx: remove debug message (also by
|
|
Ingo H. de Boer).
|
|
|
|
2007-08-26 Ingo H. de Boer <idb_winshell at SF.net>:
|
|
* win_api/Hunspell.vcproj: updated version (with phonet.*)
|
|
|
|
2007-08-23 Németh László <nemeth at OOo>:
|
|
* phonet.{c,h}xx, suggestmgr.cxx: PHONE parameter:
|
|
pronounciation based suggestion using Björn Jacke's original Aspell
|
|
phonetic transcription algorithm (http://aspell.net), relicensed
|
|
under GPL/LGPL/MPL tri-license with the permission of the author.
|
|
Usage: see manual.
|
|
|
|
* affixmgr,suggestmgr.cxx: add KEY parameter for keyboard and
|
|
input method error related suggestions.
|
|
Example: KEY qwertyuiop|asdfghjkl|zxcvbnm
|
|
|
|
* man/hunspell.4: description about PHONE and KEY suggestion parameters.
|
|
|
|
* suggestmgr.cxx: enhancements for better suggestions:
|
|
- Set ngram suggestions for badchar-type errors
|
|
and only two word and compound word suggestions, too.
|
|
- Separate not compound and compound word
|
|
suggestions for MAP suggestion, too.
|
|
- Double swap suggestions for short words.
|
|
For example: ahev -> have, hwihc -> which.
|
|
- Better time limits using clock() instead of time()
|
|
(tenths of a second resolution instead of second ones).
|
|
- leftcommonsubstring() weigth function.
|
|
|
|
* htype.hxx, hashmgr.cxx: blen (byte length) and clen (character
|
|
length) fields instead of wlen
|
|
|
|
* affixmgr.cxx: fix get_syllable() for bad Unicode inputs.
|
|
|
|
* tests/suggestiontest/*: test environment for suggestions
|
|
|
|
2007-08-07 Martijn Wargers:
|
|
* csutil.cxx: fix Mingw build error associated with ToUpper() call.
|
|
Report and patch in Mozilla Issue 391447.
|
|
|
|
2007-08-07 Robert Longson:
|
|
* atypes.cxx: use empty inline function HUNSPELL_WARNING instead of
|
|
variadic macros to switch of Hunspell warnings.
|
|
Reported by Gavin Sharp in Mozilla Issue 391147.
|
|
|
|
2007-08-05 Ginn Chen:
|
|
* hashmgr.cxx: Hunspell failed to compile on OpenSolaris (use stdio
|
|
instead of csdio). Report and patch in Mozilla Issue 391040.
|
|
|
|
2007-07-25 Németh László <nemeth at OOo>:
|
|
* parsers/*.cxx: Hunspell executable recognises and accepts URLs,
|
|
e-mail addresses, directory paths, reported by Jeppe Bundsgaard.
|
|
* src/tools/hunspell.cxx: --check-url: new option of Hunspell program.
|
|
Use --check-url, if you want check URLs, e-mail addresses and paths.
|
|
|
|
* parsers/textparser.cxx: strip colon at end of words for Finnish
|
|
and Swedish (colon may be in words in Finnish and Swedish).
|
|
Problem reported by Lars Aronsson.
|
|
* tests/colons_in_words.*: test data
|
|
|
|
* tests/digits_in_words.*: example for using digits in words
|
|
(eg. 1-jährig, 112-jährig etc. in German), reported by Lars Aronsson.
|
|
|
|
* hashmgr.cxx: Hunspell accepts allcaps forms of mixed case
|
|
words of personal dictionaries (+allcaps custom dictionary words with
|
|
allcaps affixes).
|
|
Sf.net Bug ID 1755272, reported by Ellis Miller.
|
|
|
|
* hashmgr.cxx: fix small memory leaks with alias compressed
|
|
dictionaries (free flag vectors of affixed personal dictionary words
|
|
and flag vectors of hidden capitalized forms of mixed case and
|
|
allcaps words).
|
|
|
|
* affixmgr.cxx: fix COMPOUNDRULE checking with affixed compounds.
|
|
Sf.net Bug ID 1706659, reported by Björn Jacke. Also fixing for
|
|
OOo Issue 76067 (crash-like deceleration for hexadecimal numbers
|
|
with long FFFFFF sequence using en_US dictionary).
|
|
|
|
* tools/hunspell.cxx: add missing return to save_privdic().
|
|
|
|
* man/hunspell.4: add information about affixation of personal words:
|
|
"Personal dictionaries are simple word lists, but with optional
|
|
word patterns for affixation, separated by a slash:
|
|
|
|
foo
|
|
Foo/Simpson
|
|
|
|
In this example, "foo" and "Foo" are personal words, plus Foo
|
|
will be recognised with affixes of Simpson (Foo's etc.)."
|
|
|
|
2007-07-18 Németh László <nemeth at OOo>:
|
|
* src/win_api/: add missing resource files, reported by Ingo H. de Boer.
|
|
|
|
2007-07-16 Németh László <nemeth at OOo>:
|
|
* hunspell.cxx: fix dot removing from UTF-8 encoded words in cleanword2()
|
|
(Capitalised words with dots, as "Something." were not recognised
|
|
using Unicode encoded dictionaries.)
|
|
* tests/{base.*,base_utf.*}: extended and new test files for
|
|
dot removing and Unicode support.
|
|
|
|
* tools/hunspell.cxx: fix Cygwin, OS X compatibility using platform
|
|
specifics iconv() header by ICONV_CONST macro of Autoconf.
|
|
Sf.net Bug ID 1746030, reported by Mike Tian-Jian Jiang.
|
|
Sf.net Bug ID 1753939, reported by Jean-Christophe Helary.
|
|
|
|
* tools/hunspell.cxx: fix missing global path setting with -d option.
|
|
|
|
* tests/test.sh: fix broken Valgrind checking (missing warnings
|
|
with VALGRIND=memcheck make check).
|
|
|
|
* csutil.cxx: fix condition in u8_u16() to avoid invalid read
|
|
of not null-terminated character arrays (detected by Valgrind
|
|
in Hunspell executable: associated with 8-bit character table
|
|
conversion in tools/hunspell.cxx).
|
|
|
|
* csutil.cxx: free_utf_tbl(): use utf_tbl_count-- instead of utf_tbl--.
|
|
Memory leak in Hunspell executable detected by Valgrind.
|
|
|
|
* hashmgr.cxx: add missing free_utf_tbl(), memory leak in Hunspell
|
|
executable detected by Valgrind.
|
|
|
|
* hashmgr.cxx: load_tables(): fix memory error in spec. capitalization.
|
|
Use sizeof(unsigned short) instead of bad sizeof(unsigned short*).
|
|
Invalid memory read detected by Valgrind.
|
|
|
|
* hashmgr.cxx: add_word(): fix memory error in spec. capitalization.
|
|
Update also affix array length of capitalized homonyms. Invalid
|
|
memory read detected by Valgrind.
|
|
|
|
* hunspell.cxx: suggest(): fix invalid memory write and leak.
|
|
Bad realloc() and missing free() detected by Valgrind associated
|
|
with suggestions for "something.The" type spelling errors.
|
|
|
|
* {dictmgr,csutil,hashmgr,suggestmgr}.cxx: check memory allocation.
|
|
Sf.net Bug ID 1747507, based on the patch by Jose da Silva.
|
|
|
|
2007-07-13 Ingo H. de Boer <idb_winshell at SF.net>:
|
|
* atypes.cxx: fix Visual C compatibility: Using
|
|
"HUNSPELL_WARNING(a,b,...} {}" macro instead of empty "X(a,b...)".
|
|
|
|
* hunspell.cxx: changes for Windows API.
|
|
* win_api/Hunspell.*: new resource files
|
|
* win_api/hunspelldll.*: set optional Hunspell and Borland spec. codes
|
|
Sf.net Bug ID 1753802, patch by Ingo H. de Boer.
|
|
See also Sf.net Bug ID 1751406, patch by Mike Tian-Jian Jiang.
|
|
|
|
2007-07-09 Caolan McNamara <cmc at OO.o>:
|
|
* {hunspell,hashmgr,affentry}.cxx: fix warnings of Coverity program
|
|
analyzer. Sf.net Bug ID, 1750219.
|
|
|
|
2007-07-06 Németh László <nemeth at OOo>:
|
|
* atypes.cxx: warning-free swallowing of conditional warning messages
|
|
and their parameters using empty HUNSPELL_WARNING(a,b...) macro.
|
|
* {affixmgr,atypes,csutil}.cxx: fix unused variable warnings
|
|
using WARNVAR macro for conditionally named variables.
|
|
* hashmgr.cxx: fix unused variable warning in add_word() by cond. name
|
|
* hunspell.cxx: fix shadowed declaration of captype var. in suggest()
|
|
|
|
2006-06-29 Caolan McNamara <cmc at OO.o>:
|
|
* hunspell.cxx: patch to fix possible memory leak in analyze() of
|
|
experimental morphological analyzer code. Sf.net Bug ID 1745263.
|
|
|
|
2007-06-29 Németh László <nemeth at OOo>:
|
|
improvements:
|
|
* src/hunspell/hunspell.cxx: check bad capitalisation of Dutch letter IJ.
|
|
- Sf.net Feature Request ID 1640985, reported by Frank Fesevur.
|
|
- Solution: FORBIDDENWORD for capitalised word forms (need
|
|
an improved Dutch dictionary with forbidden words: Ijs/*, etc.).
|
|
* tests/IJ.*: test data and example.
|
|
|
|
* hashmgr.cxx, hunspell.cxx: check capitalization of special word forms
|
|
- words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG
|
|
Sf.net Bug ID 1398550, reported by Dmitri Gabinski.
|
|
- allcap words and suffixes: UNICEF's - UNICEF'S
|
|
- prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA
|
|
For Catalan, French and Italian languages.
|
|
Reported by Davide Prina in OOo Issue 68568.
|
|
* tests/allcaps*: tests for OPENOFFICE.ORG, UNICEF'S capitalization.
|
|
* tests/i68568*: tests for SANT'ELIA capitalization.
|
|
|
|
* hunspell/hunspell.cxx: suggestion for missing sentence spacing:
|
|
something.The -> something. The
|
|
|
|
* tools/hunspell.cxx: multiple character encoding support
|
|
- -i option: custom input encoding
|
|
Sf.net Bug ID 1610866, reported by Thobias Schlemmer.
|
|
Sf.net Bug ID 1633413, reported by Dan Kenigsberg.
|
|
See also hunspell-1.1.5-encoding.patch of Fedora from Caolan Mc'Namara.
|
|
* tests/*.test: add input encodings
|
|
|
|
* tools/hunspell.cxx: use locale data for default dictionary names.
|
|
Sf.net Bug ID 1731630, report and patch from Bernhard Rosenkraenzer,
|
|
See also hunspell-1.1.4-defaultdictfromlang.patch of Fedora Linux
|
|
from Caolan McNamara.
|
|
|
|
* tools/hunspell.cxx: fix 8-bit tokenization (letters without
|
|
casing, like ß or Hebrew characters now are handled well)
|
|
|
|
* tools/hunspell.cxx: dictionary search path
|
|
- DICPATH environmental variable
|
|
- -D option: show directory path of loaded dictionary
|
|
- automatic detection of OpenOffice.org directories
|
|
|
|
fixes:
|
|
* affixmgr.cxx: fault-tolerant patch for REP and other affix
|
|
table data problems. Problem with Hunspell and en_GB dictionary
|
|
reported by Thomas Lange in OOo Issue 76098 and
|
|
Stephan Bergmann in OOo Issue 76100.
|
|
Sf.net Bug ID 1698240, reported by Ingo H. de Boer.
|
|
|
|
* csutil.cxx: fix mkallcap_utf() for allcaps suggestion in UTF-8.
|
|
|
|
* suggestmgr.cxx: fix bad movechar_utf() (missing strlen()).
|
|
|
|
* hunspell.cxx: fix bad degree sign detection in Unicode
|
|
hu_HU environment.
|
|
|
|
* hunspell/hunspell.cxx: free allocated memory of csconv in
|
|
ported Mozilla code.
|
|
- Mozilla Bugzilla Bug 383564, report and Mozilla MySpell patch
|
|
by Andrew Geul. Reported by Ryan VanderMeulen for Hunspell.
|
|
|
|
* suggestmgr.cxx: fix minor difference in Unicode suggestion
|
|
(ngram suggestion of allcaps words in Unicode).
|
|
|
|
* hashmgr.cxx: close file handle after errors.
|
|
Sf.net Bug ID 1736286, reported by John Nisly.
|
|
|
|
* configure.ac: syntax error (shell variable with spaces).
|
|
Sf.net Bug ID 1731625, reported by Bernhard Rosenkraenzer.
|
|
|
|
* hunspell.cxx: check_word(): fix bad usage of info pointer.
|
|
|
|
* hashmgr.cxx: fix de_DE related bug (accept words with leading dash).
|
|
Sf.net Bug ID 1696134, reported by Björn Jacke.
|
|
|
|
* suggestmgr.cxx, tests/1695964.*: fix NEEDAFFIX homonym suggestion.
|
|
Sf.net Bug ID 1695964, reported by Björn Jacke.
|
|
|
|
* tests/1463589*: capitalized ngram suggestion test data for
|
|
Sf.net Bug ID 1463589, reported by Frederik Fouvry.
|
|
|
|
* csutil.cxx, affixmgr.cxx: fix possible heap error with
|
|
multiple instances of utf_tbl.
|
|
Sf.net Bug ID 1693875, reported by Ingo H. de Boer.
|
|
|
|
* affixmgr.cxx, suggestmgr.cxx, license.hunspell: convert to ASCII.
|
|
Locale dependent compiling problems. Sf.net Bug ID 1694379, reported
|
|
by Mike Tian-Jian Jiang. OOo Issue 78018 reported by Thomas Lange.
|
|
|
|
* tests/test.sh: compatibility issues
|
|
- fix Valgrind support (check shared library instead of shell wrapper)
|
|
- remove deprecated "tail +2" syntax
|
|
- set 8-bit locale for testing (LC_ALL=C)
|
|
|
|
* hunspell.hxx: remove license.* and config.h dependencies.
|
|
- hunspell-1.1.5-badheader.patch from Caolan McNamara <cmc at OO.o>
|
|
|
|
2007-03-21 Németh László <nemeth at OOo>:
|
|
* tools/Makefile.am, munch.h, unmunch.h: add missing munch.h and unmunch.h
|
|
Reported by Björn Jacke and Khaled Hosny (sf.net Bug ID 1684144)
|
|
* hunspell/hunspell.cxx, hunspell.hxx: fix --with-ui compliling error (add get_csconv())
|
|
Reported by Khaled Hosny (sf.net Bug ID 1685010)
|
|
|
|
2007-03-19 Németh László <nemeth at OOo>:
|
|
* csutil.cxx, hunspell/hunspell.cxx: Unicode non BMP area (>65K character range) support
|
|
(except conditional patterns and strip characters of affix rules)
|
|
* tests/utf8_nonbmp*: test data
|
|
|
|
* src/hunspell/*: add Mozilla patches from David Einstein
|
|
- run-time generated 8-bit character tables
|
|
- other Mozilla related changes (see Mozilla Bugzilla Bug 319778)
|
|
|
|
* csutil.cxx, affixmgr.cxx, hashmgr.cxx: optimized version of IGNORE feature
|
|
- IGNORE works with affixes (except strip characters and affix conditions)
|
|
* tests/ignore*: test data with latin characters
|
|
* tests/ignoreutf*: Unicode test data with Arabic diacritics (Harakat)
|
|
|
|
* src/hunspell/suggestmgr.cxx: new edit distance suggestion methods
|
|
- capitalization: nasa -> NASA
|
|
- long swap: permenant -> permanent
|
|
- long mov.: Ghandi -> Gandhi
|
|
- double two characters: vacacation -> vacation
|
|
* tests/sug.*: test data
|
|
|
|
* src/hunspell/affixmgr.cxx: space in REP strings (alot -> a lot)
|
|
Note: Underline character signs the space in REP strings: REP alot a_lot, and
|
|
put the expression with space ("a lot") into the dic file (see tests/sug).
|
|
|
|
* hashmgr.cxx, affixmgr.cxx: ignore Unicode byte order mark (BOM sequence)
|
|
* tests/utf8_bom*: test data
|
|
|
|
* hunspell/*.cxx: OOo Issue 68903 - Make lingucomponent warning-free on wntmsci10
|
|
- fix Hunspell related warning messages on Windows platform (except some assignment
|
|
within conditional expressions). Reported and started by Stephan Bergmann.
|
|
|
|
* hunspell/affixmgr.cxx: fix OOo Issue 66683 - hunspell dmake debug=x fails
|
|
- Reported by Stephan Bergmann.
|
|
|
|
* src/hunspell/hunspell.[ch]xx: thread safe API for Hunspell executable
|
|
(removing prev*() functions, new spell(word, info, root) function)
|
|
|
|
* configure.ac, src/hunspell/*: HUNSPELL_EXPERIMENTAL code
|
|
--with-experimental configure option (conditional compiling of morphological analyser
|
|
and stemmer tools)
|
|
|
|
* configure.ac, src/hunspell/*: conditional Hunspell warning messages
|
|
--with-warnings configure option
|
|
|
|
* affixmgr.cxx: new, optimized parsing functions
|
|
|
|
* affixmgr.cxx: fix homonym handling for German dictionary project,
|
|
reported by Björn Jacke (sf.net Bug ID 1592880).
|
|
* tests/1592880.*: test data by Björn Jacke
|
|
|
|
* src/hunspell/affixmgr.cxx: fix CIRCUMFIX suggestion
|
|
Bug reported by Erdal Ronahi.
|
|
|
|
* hunspell.cxx: reverse root word output (complex prefixes)
|
|
Bug reported by Munzir Taha.
|
|
|
|
* tools/hunspell.cxx: fix Emacs compatibility, patch by marot at sf.net
|
|
- no % command in PIPE mode (SourceForge BugTracker 1595607)
|
|
- fix HUNSPELL_VERSION string
|
|
|
|
* suggestmgr.[hc]xx: rename check() functions to checkword() (OOo Issue 68296)
|
|
adopt MySpell patch by Bryan Petty (tierra at ooo) for Hunspell source
|
|
|
|
* csutil.cxx, munch.c, unmunch.c: adopt relevant parts of the MinGW patch
|
|
(OOo Issue 42504) by tonal at ooo
|
|
|
|
* affigmgr.cxx: remove double candidate_check() call, reported by Bram Moolenaar
|
|
|
|
* tests/test.sh: add LC_ALL="C" environment. Locale dependency of make check
|
|
reported by Gentoo project.
|
|
|
|
* src/tools/hunspell.cxx: UTF-8 highlighting fix for console UI
|
|
(not solved: breaking long UTF-8 lines)
|
|
|
|
* src/tools/unmunch.c: fix bad generation if strip is shorter than condition,
|
|
reported by Davide Prina
|
|
* src/tools/unmunch.h: increase 5000 -> 500000
|
|
|
|
* src/tools/hunspell.cxx: fix memory error in suggestion (uninitialized parameter),
|
|
Bug also reported by Björn Jacke in SourceForge Bug 1469957
|
|
|
|
* csutil.cxx, affixmgr.cxx: fix Caolan McNamara's patch for non OOo environment
|
|
|
|
2006-11-11 Caolan McNamara <cmc at OO.o>:
|
|
* csutil.cxx, affixmgr.cxx: UTF-8 table patch (OOo Issue 71449)
|
|
Description: memory optimization (OOo doesn't use the large UTF-8 table).
|
|
|
|
* Makefile.am: shared library patch (Sourceforge ID 1610756)
|
|
|
|
* hunspell.h, hunspell.cxx: C API patch (Sourceforge ID 1616353)
|
|
|
|
* hunspell.pc: pkgconfig patch (Sourceforge ID 1639128)
|
|
|
|
2006-10-17 Ryan Jones <at Mozilla Bugzilla>:
|
|
* affixmgr.cxx: missing fclose(affixlst) calls
|
|
Reported by <gavins at ooo> in OOo Issue 70408
|
|
|
|
2007-07-11 Taha Zerrouki <taha at gawab>:
|
|
* affixmgr.cxx, hunspell.cxx, hashmgr.cxx, csutil.cxx: IGNORE feature to remove
|
|
optional Arabic and other characters from input and dictionary words.
|
|
* src/hunspell/langnum.hxx: add Arabic language number, lang_ar=96
|
|
* tests/ignore.*: test data
|
|
|
|
2006-05-28 Miha Vrhovnik <mvrhov at users.sourceforge>:
|
|
* src/win_api/*: C API for Windows DLLs
|
|
- also Delphi text editor example (see on Hunspell Sourceforge page)
|
|
|
|
2006-05-18 Kevin F. Quinn <kevquinn at gentoo>:
|
|
* utf_info.cxx: struct -> static struct
|
|
Shared library patch also developed by Gentoo developers (Hanno Meyer-Thurow,
|
|
Diego Pettenò, Kevin F. Quinn)
|
|
|
|
2006-02-02 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/hunspell.cxx: suggest(): replace "fooBar" -> "foo bar" suggestions
|
|
with "fooBar" ->"foo Bar" (missing spaces are typical OCR bugs).
|
|
Bug reported by stowrob at OOo in Issue 58202.
|
|
* src/hunspell/suggestmgr.cxx: twowords(): permit 1-character words.
|
|
(restore MySpell's original behavior). Here: "aNew" -> "a New".
|
|
* tests/i58202.*: test data
|
|
|
|
* src/parsers/textparser.cxx: fix Unicode tokenization in is_wordchar()
|
|
(extra word characters (WORDCHARS) didn't work on big-endian platforms).
|
|
|
|
* src/hunspell/{csutil,affixmgr}.cxx: inline isSubset(), isRevSubset():
|
|
little speed optimalization for languages with rich morphology.
|
|
|
|
* src/tools/hunspell.cxx: fix bad --with-ui and --with-readline compiling
|
|
when (N)curses is missing. Reported by Daniel Naber.
|
|
|
|
2006-01-19 Tor Lillqvist <tml@novell.com>
|
|
* src/hunspell/csutil.cxx: mystrsep(): fix locale-dependent isspace() tokenization
|
|
|
|
2006-01-06 András Tímár <timar@fsf.hu>
|
|
* src/hunspell/{hashmgr.hxx,hunspell.cxx}: fix Visual C++ compiling errors
|
|
|
|
2006-01-05 Németh László <nemethl@gyorsposta.hu>:
|
|
* COPYING: set GPL/LGPL/MPL tri-license for Mozilla integration.
|
|
Rationale: Mozilla source code contains an old MySpell version
|
|
with GPL/LGPL/MPL tri-license. (MPL license is a copyleft license, similar
|
|
to the LGPL, but it acts on file level.)
|
|
* COPYING.LGPL: GNU Lesser General Public License 2.1 (LGPL)
|
|
* COPYING.MPL: Mozilla Public License 1.1 (MPL)
|
|
* license.hunspell, src/hunspell/license.hunspell: GPL/LGPL/MPL tri-license
|
|
|
|
* src/hunspell/{affixmgr,hashmgr}.*: AF, AM alias definitions in affix file:
|
|
compression of flag sets and morphological descriptions (see manual,
|
|
and tests/alias* test files).
|
|
Rationale: Alias compression is also good for loading time and memory
|
|
efficiency, not only smaller resources.
|
|
* src/tools/makealias: alias compression utility
|
|
(usage: ./makealias file.dic file.aff)
|
|
* tests/alias{,2,3}: AF, AM tests
|
|
* man/hunspell.4: add AF, AM documentation
|
|
* src/hunspell/affentry.cxx, atypes.hxx: add new opts bits (aeALIASM, aeALIASF)
|
|
|
|
* tools/hunspell, src/parser/*, src/hunspell/*: Hunspell program
|
|
tokenizes Unicode texts (only with UTF-8 encoded dictionaries).
|
|
Missing Unicode tokenization reported by Björn Jacke, Egmont Koblinger,
|
|
Jess Body and others.
|
|
Note: Curses interactive interface hasn't worked perfectly yet.
|
|
* tests/*.tests: remove -1 parameters of Hunspell
|
|
* tests/*.{good,wrong}: remove tabulators
|
|
|
|
* src/hunspell/{hunspell,affixmgr}.cxx: BREAK option: break words at
|
|
specified break points and checking word parts separately (see manual).
|
|
Note: COMPOUNDRULE is better (or will be better) for handling dashes and
|
|
other compound joining characters or character strings. Use BREAK, if you
|
|
want check words with dashes or other joining characters and there is no time
|
|
or possibility to describe precise compound rules with COMPOUNDRULE.
|
|
* tests/break.*: BREAK example.
|
|
|
|
* src/hunspell/{affixmgr,hunspell}.cxx: add CHECKSHARPS declaration instead
|
|
of LANG de_DE definitions to handle German sharp s in both spelling and
|
|
suggestion.
|
|
* src/hunspell/hunspell.cxx: With CHECKSHARPS, uppercase words are valid
|
|
with both lower sharp s (it's is optional for names in German legal texts)
|
|
and SS (MÜßIG, MÜSSIG). Missing lower sharp s form reported by Björn Jacke.
|
|
* src/hunspell/hunspell.cxx: KEEPCASE flag on a sharp s word has a special
|
|
meaning with CHECKSHARPS declaration: KEEPCASE permits capitalisation and SS upper
|
|
casing of a sharp s word (Müßig and MÜSSIG), but forbids the upper cased form
|
|
with lower sharp s character(s): *MÜßIG.
|
|
* tests/germancompounding*: add CHECKSHARPS, remove LANG
|
|
* tests/checksharps*: add CHECKSHARPS and KEEPCASE, remove LANG
|
|
|
|
* src/hunspell/hunspell.cxx: improved suggestions:
|
|
- suggestions for pressed Caps Lock problems: macARONI -> macaroni
|
|
- suggestions for long shift problems: MAcaroni -> Macaroni, macaroni
|
|
- suggestions for KEEPCASE words: KG -> kg
|
|
* src/hunspell/csutil.cxx: fix mystrrep() function:
|
|
- suggestions for lower sharp s in uppercased words: MÜßIG -> MÜSSIG
|
|
* tests/checksharps{,utf}.sug: add tests for mystrrep() fix
|
|
|
|
* src/hunspell/hashmgr.cxx: Now dictionary words can contain slashes
|
|
with the "\/" syntax. Problem reported by Frederik Fouvry.
|
|
|
|
* src/hunspell/hunspell.cxx: fix bad duplicate filter in suggest().
|
|
(Suggesting some capitalised compound words caused program crash
|
|
with Hungarian dictionary, OOo Issue 59055).
|
|
|
|
* src/hunspell/affixmgr.cxx: fix bad defcpd_check() call in compound_check().
|
|
(Overlapping new COMPOUNDRULE and old compounding methods caused program
|
|
crash at suggestion.)
|
|
|
|
* src/hunspell/affixmgr.{cxx,hxx}: check affix flag duplication at affix classes.
|
|
Suggested by Daniel Naber.
|
|
|
|
* src/hunspell/affentry.cxx: remove unused variable declarations (OOo i58338).
|
|
Compiler warnings reported by András Tímár and Martin Hollmichel.
|
|
|
|
* src/hunspell/hunspell.cxx: morph(): not analyse bad mixed uppercased forms
|
|
(fix Arabic morphological analysis with Buckwalter's Arabic transliteration)
|
|
|
|
* src/hunspell/affentry.{cxx,hxx}, atypes.hxx: little memory optimization
|
|
in affentry:
|
|
- using unsigned char fields instead of short (stripl, appndl, numconds)
|
|
- rename xpflg field to opts
|
|
- removing utf8 field, use aeUTF8 bit of opts field
|
|
|
|
* configure.ac: set tests/maputf.test to XFAILED on ARM platform.
|
|
Fail reported by Rene Engelhard.
|
|
|
|
* configure.ac: link Ncursesw library, if exists.
|
|
|
|
* BUGS: add BUGS file
|
|
|
|
* tests/complexprefixes2.*: test for morphological analysis with COMPLEXPREFIXES
|
|
|
|
* src/hunspell/affixmgr.cxx: use "COMPOUNDRULE" instead of
|
|
"COMPOUND". The new name suggested by Bram Moolenaar.
|
|
* tests/compoundrule*: modified and renamed compound.* test files
|
|
|
|
* man/hunspell.4: AF, AM, BREAK, CHECKSHARPS, COMPOUNDRULE, KEEPCASE.
|
|
- also new addition to the documentation:
|
|
Header of the dictionary file define approximate dictionary size:
|
|
``A dictionary file (*.dic) contains a list of words, one per line.
|
|
The first line of the dictionaries (except personal dictionaries)
|
|
contains the _approximate_ word count (for optimal hash memory size).''
|
|
Asked by Frederik Foudry.
|
|
|
|
One-character replacements in REP definitions: ``It's very useful to
|
|
define replacements for the most typical one-character mistakes, too:
|
|
with REP you can add higher priority to a subset of the TRY suggestions
|
|
(suggestion list begins with the REP suggestions).''
|
|
|
|
2005-11-11 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/affixmgr.*: fix Unicode MAP errors (sorted only n-1
|
|
characters instead of n ones in UTF-16 MAP character lists).
|
|
Bug reported by Rene Engelhard.
|
|
|
|
* src/hunspell/affixmgr.*: fix infinite COMPOUND matching (default char
|
|
type is unsigned on PowerPC, s390 and ARM platforms and it will never
|
|
be negative). Bug reported by Rene Engelhard.
|
|
|
|
* src/hunspell/{affixmgr,suggestmgr}.cxx: fix bad ONLYINCOMPOUND
|
|
word suggestions.
|
|
* tests/onlyincompound.sug: empty test file to check this fix.
|
|
Bug reported by Björn Jacke.
|
|
|
|
* src/hunspell/affixmgr.cxx: fix backtracking in COMPOUND pattern matching.
|
|
* tests/compound6.*: test files to check this fix.
|
|
|
|
* csutil.cxx: set bigger range types in flag_qsort() and flag_bsearch().
|
|
|
|
* affixmgr.hxx: set better type for cont_classes[] Boolean data (short -> char)
|
|
|
|
* configure.ac, tests/automake.am: set platform specific XFAIL test
|
|
(flagutf8.test on ARM platform)
|
|
|
|
2005-11-09 Németh László <nemethl@gyorsposta.hu>:
|
|
improvements:
|
|
* src/hunspell/affixmgr.*: new and improved affix file parameters:
|
|
|
|
- COMPOUND definitions: compound patterns with regexp-like matching.
|
|
See manual and test files: tests/compound*.*
|
|
Suggested by Bram Moolenaar.
|
|
Also useful for simple word-level lexical scanning, for example
|
|
analysing numbers or words with numbers (OOo Issue #53643):
|
|
http://qa.openoffice.org/issues/show_bug.cgi?id=53643
|
|
Examples: tests/compound{4,5}.*.
|
|
|
|
- NOSUGGEST flag: words signed with NOSUGGEST flag are not suggested.
|
|
Proposed flag for vulgar and obscene words (OOo Issue #55498).
|
|
Example: tests/nosuggest.*.
|
|
Problem reported by bobharvey at OOo:
|
|
http://qa.openoffice.org/issues/show_bug.cgi?id=55498
|
|
|
|
- KEEPCASE flag: Forbid capitalized and uppercased forms of words
|
|
signed with KEEPCASE flags. Useful for special ortographies
|
|
(measurements and currency often keep their case in uppercased
|
|
texts) and other writing systems (eg. keeping lower case of IPA
|
|
characters).
|
|
|
|
- CHECKCOMPOUNDCASE: Forbid upper case characters at word bound in compounds.
|
|
Examples: tests/checkcompoundcase* and tests/germancompounding.*
|
|
|
|
- FLAG UTF-8: New flag type: Unicode character encoded with UTF-8.
|
|
Example: tests/flagutf8.*.
|
|
Rationale: Unicode character type can be more readable
|
|
(in a Unicode text editor) than `long' or `num' flag type.
|
|
|
|
bug fixes:
|
|
* src/hunspell/hunspell.cxx: accept numbers and numbers with separators (i53643)
|
|
Bug reported by skelet at OOo:
|
|
http://qa.openoffice.org/issues/show_bug.cgi?id=53643
|
|
|
|
* src/hunspell/csutil.cxx: fix casing data in ISO 8859-13 character table.
|
|
|
|
* src/hunspell/csutil.cxx: add ISO-8859-15 character encoding (i54980)
|
|
Rationale: ISO-8859-15 is the default encoding of the French OpenOffice.org
|
|
dictionary. ISO-8859-15 is a modified version of ISO-8859-1
|
|
(latin-1) character encoding with French œ ligatures and euro
|
|
symbol. Problem reported by cbrunet at OOo in OOo Issue 54980:
|
|
http://qa.openoffice.org/issues/show_bug.cgi?id=54980
|
|
|
|
* src/hunspell/affixmgr.cxx: fix zero-byte malloc after a bad affix header.
|
|
Patch by Harri Pitkänen.
|
|
|
|
* src/hunspell/suggestmgr.cxx: fix bad NEEDAFFIX word suggestion
|
|
in ngram suggestions. Reported by Daniel Naber and Friedel Wolff.
|
|
|
|
* src/hunspell/hashmgr.cxx: fix bad white space checking in affix files.
|
|
src/hunspell/{csutil,affixmgr}.cxx: add other white space separators.
|
|
Problems with tabulators reported by Frederik Fouvry.
|
|
|
|
* src/hunspell/*: replace system-dependent <license.*> #include
|
|
parameters with quoted ones. Problem reported by Dafydd Jones.
|
|
|
|
* src/hunspell/hunspell.cxx: fix missing morphological analysis of dot(s)
|
|
Reported by Trón Viktor.
|
|
|
|
changes:
|
|
* src/hunspell/affixmgr.cxx: rename PSEUDOROOT to NEEDAFFIX.
|
|
Suggested by Bram Moolenaar.
|
|
|
|
* src/hunspell/suggestmgr.hxx: Increase default maximum of
|
|
ngram suggestions (3->5). Suggested by Kevin Hendricks.
|
|
|
|
* src/hunspell/htypes.hxx: Increase MAXDELEN for long affix flags.
|
|
|
|
* src/hunspell/suggestmgr.cxx: modify (perhaps fix) Unicode map suggestion.
|
|
tests/maputf test fail on ARM platform reported by Rene Engelhard.
|
|
|
|
* src/hunspell/{affentry.cxx,atypes.hxx}: remove [PREFIX] and
|
|
MISSING_DESCRIPTION messages from morphological analysis.
|
|
Problems reported by Trón Viktor.
|
|
|
|
* tests/germancompounding.{aff,good}: Add "Computer-Arbeit" test word.
|
|
Suggested by Daniel Naber.
|
|
|
|
* doc/man/hunspell.4: Proof-reading patch by Goldman Eleonóra.
|
|
|
|
* doc/man/hunspell.4: Fix bad affix example (replace `move' with `work').
|
|
Bug reported by Frederik Fouvry.
|
|
|
|
* tests/*: new test files:
|
|
affixes.*: simple affix compression example from Hunspell 4 manual page
|
|
checkcompoundcase.*, checkcompoundcase2.*, checkcompoundcaseutf.*
|
|
compound.*, compound2.*, compound3.*, compound4.*, compound5.*
|
|
compoundflag.* (former compound.*)
|
|
flagutf8.*: test for FLAG UTF-8
|
|
germancompounding.*: simplification with CHECKCOMPOUNDCASE.
|
|
germancompoundingold.* (former germancompounding.*)
|
|
i53643.*: check numbers with separators
|
|
i54980.*: ISO8859-15 test
|
|
keepcase.*: test for KEEPCASE
|
|
needaffix*.* (former pseudoroot*.* tests)
|
|
nosuggest.*: test for NOSUGGEST
|
|
|
|
2005-09-19 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/suggestmgr.cxx: improved ngram suggestion:
|
|
- detect not neighboring swap characters (pernament -> permanent)
|
|
Rationale: ngram method has a significant error with not neighboring
|
|
swap characters, especially when swap is in the middle of the word.
|
|
- suggest uppercase forms (unesco -> UNESCO, siggraph's -> SIGGRAPH's)
|
|
- suggest only ngram swap character and uppercase form, if they exist.
|
|
Rationale: swap character and casing equivalence give mutch better
|
|
suggestions as any other (weighted) ngram suggestions.
|
|
- add uppercase suggestion (PERMENANT -> PERMANENT)
|
|
|
|
* src/hunspell/*: complete comparison with MySpell 3.2 (in OOo beta 2):
|
|
- affixmgr.cxx: add missing numrep initialization
|
|
- hashmgr.cxx: add_word(): don't allocate temporary records
|
|
- hunspell.cxx: in suggest():
|
|
- check capitalized words first (better sug. order for proper names),
|
|
- check pSMgr->suggest() return value
|
|
- set pSMgr->suggest() call to not optional in HUHCAP
|
|
- csutil.cxx: fix bad KOI8-U -> koi8r_tbl reference in enc_entry encds
|
|
- csutil.cxx: fix casing data in ISO 8859-2, Windows 1251 and KOI8-U
|
|
encoding tables. Bug reported by Dmitri Gabinski.
|
|
|
|
* src/hunspell/affixmgr.*: improved compound word and other features
|
|
- generalize hu_HU specific compound word features with new affix file
|
|
parameters, suggested by Bram Moolenaar:
|
|
- CHECKCOMPOUNDDUP: forbid word duplication in compounds (eg. foo|foo)
|
|
- CHECKCOMPOUNDTRIPLE: forbid triple letters in compounds (eg. foo|obar)
|
|
- CHECKCOMPOUNDPATTERN: forbid patterns at word bounds in compounds
|
|
- CHECKCOMPOUNDREP: using REP replacement table, forbid presumably bad
|
|
compounds (useful for languages with unlimited number of compounds)
|
|
- ONLYINCOMPOUND flag works also with words (see tests/onlyincompound.*)
|
|
Suggested by Daniel Naber, Björn Jacke, Trón Viktor & Bram Moolenaar.
|
|
- PSEUDOROOT works also with prefixes and prefix + suffix combinations
|
|
(see tests/pseudoroot5.*). Suggested by Trón Viktor.
|
|
- man/hunspell.4: updated man page
|
|
|
|
* src/hunspell/affixmgr.*: fix incomplete prefix handling with twofold
|
|
suffixes (delete unnecessary contclasses[] conditions in
|
|
prefix_check_twosfx() and prefix_check_twosfx_morph()).
|
|
Bug reported by Trón Viktor.
|
|
|
|
* src/hunspell/affixmgr.*: complete also *_morph() functions with
|
|
conditions of new Hunspell features (circumfix, pseudoroot etc.).
|
|
|
|
* src/hunspell/suggestmgr.cxx:
|
|
- fix missing suggestions for words with crossed prefix and suffix
|
|
- fix redundant non compound word checking
|
|
- fix losing suggestions problem. Bug reported by Dmitri Gabinski.
|
|
|
|
* src/hunspell/dictmgr.*:
|
|
- add new dictionary manager for Hunspell UNO modul
|
|
Problems with eo_ANY Esperanto locale reported by Dmitri Gabinski.
|
|
|
|
* src/hunspell/*: use precise constant sizes for 8-bit and 16-bit character
|
|
arrays with MAXWORDUTF8LEN and MAXSWUTF8L macros.
|
|
|
|
* src/hunspell/affixmgr.cxx: fix bad MAXNGRAMSUGS parameter handling
|
|
|
|
* src/hunspell/affixmgr.cxx, src/tools/{un}munch.*: fix GCC 4.0 warnings
|
|
on fgets(), reported by Dvornik László
|
|
|
|
* po/hu.po: improved translation by Dvornik László
|
|
|
|
* tests/test.sh: improved test environment
|
|
- add suggestion testing (see tests/*.sug)
|
|
- add memory debugging environment, based on the excellent Valgrind debugger.
|
|
Usage on Linux and experimental platforms of Valgrind:
|
|
VALGRIND=memcheck make check
|
|
- rename test_hunmorph to test.sh
|
|
|
|
* tests/*: new tests:
|
|
- base.*: base example based on MySpell's checkme.lst.
|
|
- map{,utf}.*, rep{,utf}: MAP and REP suggestion examples
|
|
- tests on new CHECKCOMPOUND, ONLYINCOMPOUND and PSEUDOROOT features
|
|
- i54633.*: capitalized suggestion test for Issue 54633 from OOo's Issuezilla
|
|
- i35725.*: improved ngram suggestion test for Issue 35725
|
|
|
|
2005-08-26 Németh László <nemethl@gyorsposta.hu>:
|
|
improvements:
|
|
|
|
* src/hunspell/suggestmgr.cxx:
|
|
Unicode support in related character map suggestion
|
|
|
|
* src/hunspell/suggestmgr.cxx: Unicode support in ngram suggestion
|
|
|
|
* src/hunspell/{suggestmgr,affixmgr,hunspell}.cxx: improve ngram suggestion.
|
|
Fix http://qa.openoffice.org/issues/show_bug.cgi?id=35725. See release
|
|
notes for examples. This problem reported by beccablain at OOo.
|
|
- ngram suggestions now are case insensitive (see `Permenant' bug in Issuezilla)
|
|
- weight ngram suggestions (with the longest common subsequent algorithm,
|
|
also considering lengths of bad word and suggestion, identical first
|
|
letters and almost completely identical character positions)
|
|
- set strict affix congruency in expand_rootword(). Now ngram suggestions
|
|
are good for languages with rich morphology and also better for English.
|
|
Rationale: affixed forms of the first ngram suggestion
|
|
very often suppress the second and subsequent root word suggestions. But
|
|
faults in affixes are more uncommon, and can be fix without suggestions.
|
|
We must prefer the more informative second and subsequent root word
|
|
suggestions instead of the suggestions for bad affixes.
|
|
- a better suggestion may not be substring of a less good suggestion
|
|
Rationale: Suggesting affixed forms of a root word is
|
|
unnecessary, when root word has got better weighted ngram value.
|
|
(Checking substrings is a good approximation for this refinement.)
|
|
- lesser ngram suggestions (default 3 maximum instead of 10)
|
|
Rationale: For users need a big extra effort to check a lot of bad ngram
|
|
suggestions, nine times out of ten unnecessarily. It is very
|
|
distracting, because ngram suggestions could be very different.
|
|
Usually Myspell and Hunspell suggest one or two suggestions with
|
|
the old suggestion algorithms (maximum is 15), with ngram algorithm
|
|
often gives maximum number suggestions. With strict affix congruency
|
|
and other refinements, the good suggestion there is usually among the
|
|
first three elements.
|
|
- new affix parameter: MAXNGRAMSUG
|
|
|
|
* src/hunspell/*: support agglutinative languages with rich prefix
|
|
morphology or with right-to-left writing system (for example, Turkic
|
|
and Austronesian languages with (modified) Arabic scripts).
|
|
- new affix parameter: COMPLEXPREFIXES
|
|
Set twofold prefix stripping (but single suffix stripping)
|
|
* src/hunspell/affixmgr.cxx:
|
|
- speed up prefix loading with tree sorting algorithm.
|
|
* tests/complexprefixes.*, tests/complexprefixesutf.*:
|
|
Coptic example posted by Moheb Mekhaiel
|
|
|
|
* src/hunspell/hashmgr.cxx: check size attribute in dic file
|
|
suggested by Daniel Naber
|
|
Rationale: With missing size attribute Hunspell allocates too small and
|
|
more slower hash memory, and Hunspell can lose first dictionary word.
|
|
|
|
* src/hunspell/affixmgr.cxx: check stripping characters and condition
|
|
compatibility in affix rules (bugs detected in cs_CZ, es_ES, es_NEW,
|
|
es_MX, lt_LT, nn_NO, pt_PT, ro_RO and sk_SK dictionaries). See release
|
|
notes of Hunspell 1.0.9 in NEWS.
|
|
|
|
* src/hunspell/affixmgr.cxx: check unnecessary fields in affix rules
|
|
(bugs detected in ro_RO and sv_SE dictionaries). See release notes.
|
|
|
|
* src/hunspell/affixmgr.cxx: remove redundant condition checking
|
|
in affix rules with stripping characters (redundancy in OpenOffice.org
|
|
dictionaries reported by Eleonóra Goldman)
|
|
Rationale: this is a little optimization, but it was excellent for
|
|
detect the bad ngram affixation with bad or weak affix conditions.
|
|
|
|
* tests/germancompounding.aff: improve compound definition
|
|
- use dash prefix instead of language specific tokenizer
|
|
Rationale: Using uniform approach is the right way to check and analyze
|
|
compound words. Language specific word breaking is deprecated, need
|
|
a sophisticated grammar checking for word-like word pairs
|
|
(for example in Hungarian there is a substandard, but accepted
|
|
syntax with dash for word pairs: cats, dogs -> kutyák-macskák (like
|
|
cats/dogs in English).
|
|
|
|
* test Hunspell with 54 OpenOffice.org dictionaries: see release notes
|
|
|
|
bug fixes:
|
|
|
|
* src/hunspell/suggestmgr.*: add time limit to exponential
|
|
algorithm of the related character map suggestion
|
|
Rationale: a long word in agglutinative languages or a special pattern
|
|
(for example a horizontal rule) made of map characters can `crash' the
|
|
spell checker.
|
|
|
|
* src/hunspell/affentry.cxx: add() functions: fix bad word generation
|
|
checking stripping characters (see similar bug in unmunch)
|
|
|
|
* src/hunspell/affixmgr.cxx: parse_file(): fix unconditional getNext()
|
|
call for ~AffixMgr() when affix file is corrupt.
|
|
|
|
* src/hunspell/affixmgr.*: AffixMgr(), parse_cpdsyllable(): fix missing
|
|
string duplications for ~AffixMgr() when affix file is corrupt.
|
|
|
|
* src/hunspell/affixmgr.*: parse_affix(): fix fprintf() call when affix
|
|
file is corrupt. Bug reported by Daniel Naber.
|
|
|
|
* suggestmgr.cxx: replace single usage of 'strdup' with 'mystrdup'
|
|
patch by Chris Halls (debian.org)
|
|
|
|
* src/hunspell/makefile.mk: add makefile.mk for compiling in OpenOffice.org
|
|
See README in Hunspell UNO modul.
|
|
Problems with separated compiling reported by Rene Engelhard
|
|
|
|
* src/hunspell/hunspell.cxx: fix pseudoroot support
|
|
- search a not pseudoroot homonym in check()
|
|
* tests/pseudoroot4.*: test this fix
|
|
|
|
* src/tools/unmunch.c: fix bad word generation when conditions
|
|
are shorter or incompatible with stripping characters in affix rules
|
|
|
|
* src/tools/unmunch.c: fix mychomp() for de_AT.dic and other dic files
|
|
without last new line character.
|
|
|
|
other changes:
|
|
* src/hunspell/suggestmgr.*: erase ACCENT suggestion
|
|
Rationale: ACCENT suggestion was the same as Kevin Hendrick's map
|
|
suggestion algorithm, but with a less good interface in affix file.
|
|
|
|
* src/hunspell/suggestmgr.*: combine cycle number limit
|
|
in badchar(), and forgotchar() with a time limit.
|
|
|
|
* src/hunspell/affixmgr.*: remove NOMAPSUGS affix parameter
|
|
|
|
* src/hunspell/{suggestmgr,hunspell}.*: strip periods from
|
|
suggestions (restore MySpell's original behaviour)
|
|
Rationale: OpenOffice.org has an automatic period handling mechanism
|
|
and suggestions look better without periods.
|
|
- new affix file parameter: SUGSWITHDOTS
|
|
Add period(s) to suggestions, if input word terminates in period(s).
|
|
(No need for OpenOffice.org dictionaries.)
|
|
|
|
* tests/germancompounding.aff: improve bad german affix in affix example
|
|
(computeren->computern). Suggested by Daniel Naber.
|
|
|
|
* src/tools/example.cxx: add Myspell's example
|
|
|
|
* src/tools/munch.cxx: add Myspell's munch
|
|
|
|
* man{,/hu}/hunspell.4: refresh manual pages
|
|
|
|
2005-08-01 Németh László <nemethl@gyorsposta.hu>:
|
|
* add missing MySpell files and features:
|
|
- add MySpell license.readme, README and CONTRIBUTORS ({license,README,AUTHORS}.myspell)
|
|
- add MySpell unmunch program (src/tools/unmunch.c)
|
|
- add licenses to source (src/hunspell/license.{myspell,hunspell})
|
|
- port MAP suggestion (with imperfect UTF-8 support)
|
|
- add NOSPLITSUGS affix parameter
|
|
- add NOMAPSUGS affix parameter
|
|
|
|
* src/man/man.4: MAP, COMPOUNDPERMITFLAG, NOSPLITSUGS, NOMAPSUGS
|
|
|
|
* src/hunspell/aff{entry,ixmgr}.cxx:
|
|
- improve compound word support
|
|
- new affix parameter: COMPOUNDPERMITFLAG (see manual)
|
|
* src/tests/compoundaffix{,2}.*: examples for COMPOUNDPERMITFLAG
|
|
* src/tests/germancompounding.*: new solution for German compounding
|
|
Problems with German compounding reported by Daniel Naber
|
|
|
|
* src/hunspell/hunspell.cxx: fix German uppercase word spelling
|
|
with the spellsharps() recursive algorithm.
|
|
Default recursive depth is 5 (MAXSHARPS).
|
|
* src/tests/germansharps*: extended German sharp s tests
|
|
|
|
* src/tools/hunspell.cxx: fix fatal memory bug in non-interactive
|
|
subshells without HOME environmental variable
|
|
Bug detected with PHP by András Izsók.
|
|
|
|
2005-07-22 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/csutil.hxx: utf16_u8()
|
|
- fix 3-byte UTF-8 character conversion
|
|
|
|
2005-07-21 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/csutil.hxx: hunspell_version() for OOo UNO modul
|
|
|
|
2005-07-19 Németh László <nemethl@gyorsposta.hu>:
|
|
* renaming:
|
|
- src/morphbase -> src/hunspell
|
|
- src/hunspell, src/hunmorph -> src/tools
|
|
- src/huntokens -> src/parsers
|
|
|
|
* src/tools/hunstem.cxx: add stemmer example
|
|
|
|
2005-07-18 Németh László <nemethl@gyorsposta.hu>:
|
|
* configure.ac: --with-ui, --with-readline configure options
|
|
* src/hunspell/hunspell.cxx: fix conditional compiling
|
|
|
|
* src/hunspell/hunspell.cxx: set HunSPELL.bak temporaly file
|
|
in the same dictionary with the checked file.
|
|
|
|
* src/morphbase/morphbase.cxx:
|
|
|
|
- handling German sharp s (ß)
|
|
|
|
- fix (temporaly) analyize()
|
|
|
|
* tests: a lot of new tests
|
|
|
|
* po/, intl/, m4/: add gettext from GNU hello
|
|
|
|
* po/hu.po: add Hungarian translation
|
|
|
|
* doc/, man/: rename doc to man
|
|
|
|
2005-07-04 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/morphbase/hashmgr.cxx: set FLAG attributum instead of FLAG_NUM and FLAG_LONG
|
|
|
|
* doc/hunspell.4: manual in English
|
|
|
|
2005-06-30 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/morphbase/csutil.cxx: add character tables from csutil.cxx of OOo 1.1.4
|
|
|
|
* src/morphbase/affentry.cxx: fix Unicode condition checking
|
|
|
|
* tests/{,utf}compound.*: tests compounding
|
|
|
|
2005-06-27 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/morphbase/*: fix Unicode compound handling
|
|
|
|
2005-06-23 Halácsy Péter:
|
|
* src/hunmorph/hunmorph.cxx: delete spelling error message and suggest_auto() call
|
|
|
|
2005-06-21 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/morphbase: Unicode support
|
|
* tests/utf8.*: SET UTF-8 test
|
|
|
|
* src/morphbase: checking and fixing with Valgrind
|
|
Memory handling error reported by Ferenc Szidarovszky
|
|
|
|
2005-05-26 Németh László <nemethl@gyorsposta.hu>:
|
|
* suggestmgr.cxx: fix stemming
|
|
* AUTHORS, COPYING, ChangeLog: set CC-LGPL free software license
|
|
|
|
2004-05-25 Varga Dániel <daniel@all.hu>
|
|
* src/stemtool: new subproject
|
|
|
|
2005-05-25 Halácsy Péter <peter@halacsy.com>
|
|
* AUTHORS, COPYING: set CC Attribution license
|
|
|
|
2004-05-23 Varga Dániel <daniel@all.hu>
|
|
* src: - modifications for compiling with Visual C++
|
|
|
|
* src/hunmorph/csutil.cxx: correcting header of flag_qsort(),
|
|
* src/hunmorph/*: correct csutil include
|
|
|
|
2005-05-19 Németh László <nemethl@gyorsposta.hu>
|
|
* csutil.cxx: fix loop condition in lineuniq()
|
|
bug reported by Viktor Nagy (nagyv nyelvtud hu).
|
|
|
|
* morphbase.cxx: handle PSEUDOROOT with zero affixes
|
|
bug reported by Viktor Nagy (nagyv nyelvtud hu).
|
|
* tests/zeroaffix.*: add zeroaffix tests
|
|
|
|
2005-04-09 Németh László <nemethl@gyorsposta.hu>
|
|
* config.h.in: reset with autoheader
|
|
|
|
* src/hunspell/hunspell.cxx: set version
|
|
|
|
2005-04-06 Németh László <nemethl@gyorsposta.hu>
|
|
* tests: tests
|
|
|
|
* src/morphbase:
|
|
New optional parameters in affix file:
|
|
- PSEUDOROOT: for forbidding root with not forbidden suffixed forms.
|
|
- COMPOUNDWORDMAX: max. words in compounds (default is no limit)
|
|
- COMPOUNDROOT: signs compounds in dictionary for handling special compound rules
|
|
- remove COMPOUNDWORD, ONLYROOT
|
|
|
|
2005-03-21 Németh László <nemethl@gyorsposta.hu>
|
|
* src/morphbase/*:
|
|
- 2-byte flags, FLAG_NUM, FLAG_LONG
|
|
- CIRCUMFIX: signed suffixes and prefixes can only occur together
|
|
- ONLYINCOMPOUND for fogemorpheme (Swedish, Danish) or Flute-elements (German)
|
|
- COMPOUNDBEGIN: allow signed roots, and roots with signed suffix in begin of compounds
|
|
- COMPOUNDMIDDLE: like before, but middle of compounds
|
|
- COMPOUNDEND: like before, but end of compounds
|
|
- remove COMPOUNDFIRST, COMPOUNDLAST
|