forked from mia/Aegisub
391219ea54
1. svn mv assdraw csri hunspell lua51 contrib * See r2749 for full description. Originally committed to SVN as r2754.
1107 lines
51 KiB
Text
1107 lines
51 KiB
Text
2007-11-01 Németh László <nemeth at OOo>:
|
|
* hunspell/*: new feature: morphological generation,
|
|
also fix experimental morphological analysis and stemming.
|
|
- new API functions and improved API:
|
|
- analyze(word): (instead of morph()) morphological analysis
|
|
- stem(word): stemming
|
|
- stem(list): stemming based on the result of an analysis
|
|
- generate(word, word2): morphological generation
|
|
- generate(word, list): morphological generation
|
|
- add(word): add word to the run-time dictionary (renamed put_word())
|
|
- add_with_affix(word, word2): (renamed put_word_pattern()):
|
|
add word to the run-time dictionary with affix flags of the
|
|
second parameter: all affixed forms of the user words will be
|
|
recognised by the spell checker. Especially useful for
|
|
agglutinative languages.
|
|
- remove(word): remove word from the run-time dictionary (not
|
|
implemented)
|
|
- see manual and hunspell/hunspell.hxx header and tests/morph.*
|
|
* tests/morph.*: test data, example for morphological analysis,
|
|
stemming and generation
|
|
|
|
* tools/analyze, tools/chmorph: extended and new demo applications:
|
|
- analyze (originally hunmorph): analyses and stems input words,
|
|
generates word forms from input word pairs.
|
|
- chmorph: morphological transformation filter
|
|
|
|
* configure.ac, hunspell/makefile.am: set library version number.
|
|
Bug reported by Rene Engelhard.
|
|
|
|
* affentry.cxx, affixmgr.cxx: new pattern matching algorithm in
|
|
condition checking of affix rules instead of the Dömölki-algorithm:
|
|
- Unlimited condition length (instead of max. 8 characters).
|
|
- Less memory consumption, especially useful for affix rich languages:
|
|
5,4 MB memory savings with hu_HU dictionary.
|
|
- Speed change depends from dictionaries and CPU caches: English spell
|
|
checking is 4% faster on Linux words with en_US dictionary, Hungarian
|
|
spell checking is 25% slower on most frequent words of Hungarian
|
|
Webcorpus.
|
|
|
|
* tests/sug.*, sugutf.*: updated test data (use "a" and "lot"
|
|
dictionary items instead of "a lot".)
|
|
|
|
* src/hunspell/hunspell.cxx: free(csconv) instead of delete csconv.
|
|
Report and patch by Sylvain Paschein in Mozilla Issue 398268.
|
|
|
|
* suggestmgr.cxx, tools/hunspell.cxx: bad spelling of "misspelled".
|
|
Ubuntu Bug #134792, patch by Malcolm Parsons.
|
|
|
|
* tests/base_utf.*: use Unicode apostrophe instead of 8-bit one.
|
|
|
|
* hunspell.cxx, hashmgr.cxx: add(): use HashMgr::add()
|
|
|
|
2007-10-25 Pavel Janík <pjanik at OOo>:
|
|
* hunspell/csutil.cxx: Fix type cast warnings on 64bit Linux in
|
|
printing of character positions in u8_u16(). OOo issue 82984.
|
|
|
|
2007-09-05 Németh László <nemeth at OOo>:
|
|
* win_api/Hunspell.vproj, parsers/testparser.cxx,textparser.hxx:
|
|
warning fixes and removing unnecessary Windows project file.
|
|
Reported by Ingo H. De Boer.
|
|
|
|
* hashmgr.*, {affixmgr,suggestmgr}.cxx: optimized data structure
|
|
for variable-count fields (only "ph" transliteration field in
|
|
this version, see next item). Also less memory consumption:
|
|
-13% (0.75 MB) with en_US dictionary, -6% (1 MB) with hu_HU.
|
|
|
|
* suggestmgr.cxx: dictionary based phonetic suggestion for special
|
|
or foreign pronounciation (see also rule-based PHONE in manual).
|
|
Usage: tab separated field in dictionary lines, started with "ph:".
|
|
The field contains a phonetic transliteration of the word:
|
|
|
|
Marseille ph:maarsayl
|
|
* tests/phone.*: test data for dictionary and rule based phonetic
|
|
suggestion.
|
|
|
|
* hunspell.cxx: fix potential bad memory access in allcap word
|
|
capitalization in suggest() (bug of previous version).
|
|
|
|
* hunspell.cxx, atypes.hxx: set correct limit for UTF-8 encoded
|
|
input words (256 byte).
|
|
|
|
* suggestmgr.cxx: improved REP suggestions with spaces: it works
|
|
without dictionary modification.
|
|
OOo issue 80147, reported by Davide Prina.
|
|
* tests/rep.*: new test data: higher priority for "alot" -> "a lot",
|
|
and Italian suggestion "un'alunno" -> "un alunno".
|
|
|
|
* affixmgr.cxx: fix Unicode ngram suggestions in expand_rootword().
|
|
(Suggestions with bad affixes.)
|
|
Bug reported by Vitaly Piryatinksy <piv dot v dot vitaly at gmail>.
|
|
* tests/ngram_utf_fix.*: test based on Vitaly Piryatinksy's data.
|
|
|
|
* suggestmgr.cxx: fix twowords() for last UTF-8 multibyte character.
|
|
(conditional jump or move depended on uninitialised value).
|
|
|
|
2007-08-29 Ingo H. De Boer <idb_winshell at SF.net>:
|
|
* win_api/{hunspell,libhunspell, testparser}.vcproj: new project
|
|
files for the library and the executables.
|
|
|
|
* Hunspell.rc, Hunspell.sln, config.h: updated versions.
|
|
Version number problem also reported by András Tímár.
|
|
|
|
2007-08-27 Németh László <nemeth at OOo>:
|
|
* suggestmgr.hxx: put fixed version. Bug report by Ingo H. De Boer.
|
|
|
|
* suggestmgr.cxx: remove variable-length local character array
|
|
reported by Ingo H. De Boer.
|
|
|
|
2007-08-27 Németh László <nemeth at OOo>:
|
|
* suggestmgr.hxx: change bad time_t to clock_t in header, too.
|
|
Bug reports or patches by Ingo H. De Boer under SF.net
|
|
Bug ID 1781951, János Mohácsi and Gábor Zahemszky, András Tímár,
|
|
OMax3 at SF.net under SF.net Bug ID 1781592.
|
|
|
|
* phonet.*: change variable-length local character array to
|
|
portable fixed size character array. Problem reported by
|
|
Ingo H. De Boer under SF.net Bug ID 1781951 and
|
|
Ryan VanderMeulen.
|
|
|
|
* suggestmgr.cxx: remove debug message (also by
|
|
Ingo H. De Boer).
|
|
|
|
2007-08-26 Ingo H. De Boer <idb_winshell at SF.net>:
|
|
* win_api/Hunspell.vcproj: updated version (with phonet.*)
|
|
|
|
2007-08-23 Németh László <nemeth at OOo>:
|
|
* phonet.{c,h}xx, suggestmgr.cxx: PHONE parameter:
|
|
pronounciation based suggestion using Björn Jacke's original Aspell
|
|
phonetic transcription algorithm (http://aspell.net), relicensed
|
|
under GPL/LGPL/MPL tri-license with the permission of the author.
|
|
Usage: see manual.
|
|
|
|
* affixmgr,suggestmgr.cxx: add KEY parameter for keyboard and
|
|
input method error related suggestions.
|
|
Example: KEY qwertyuiop|asdfghjkl|zxcvbnm
|
|
|
|
* man/hunspell.4: description about PHONE and KEY suggestion parameters.
|
|
|
|
* suggestmgr.cxx: enhancements for better suggestions:
|
|
- Set ngram suggestions for badchar-type errors
|
|
and only two word and compound word suggestions, too.
|
|
- Separate not compound and compound word
|
|
suggestions for MAP suggestion, too.
|
|
- Double swap suggestions for short words.
|
|
For example: ahev -> have, hwihc -> which.
|
|
- Better time limits using clock() instead of time()
|
|
(tenths of a second resolution instead of second ones).
|
|
- leftcommonsubstring() weigth function.
|
|
|
|
* htype.hxx, hashmgr.cxx: blen (byte length) and clen (character
|
|
length) fields instead of wlen
|
|
|
|
* affixmgr.cxx: fix get_syllable() for bad Unicode inputs.
|
|
|
|
* tests/suggestiontest/*: test environment for suggestions
|
|
|
|
2007-08-07 Martijn Wargers:
|
|
* csutil.cxx: fix Mingw build error associated with ToUpper() call.
|
|
Report and patch in Mozilla Issue 391447.
|
|
|
|
2007-08-07 Robert Longson:
|
|
* atypes.cxx: use empty inline function HUNSPELL_WARNING instead of
|
|
variadic macros to switch of Hunspell warnings.
|
|
Reported by Gavin Sharp in Mozilla Issue 391147.
|
|
|
|
2007-08-05 Ginn Chen:
|
|
* hashmgr.cxx: Hunspell failed to compile on OpenSolaris (use stdio
|
|
instead of csdio). Report and patch in Mozilla Issue 391040.
|
|
|
|
2007-07-25 Németh László <nemeth at OOo>:
|
|
* parsers/*.cxx: Hunspell executable recognises and accepts URLs,
|
|
e-mail addresses, directory paths, reported by Jeppe Bundsgaard.
|
|
* src/tools/hunspell.cxx: --check-url: new option of Hunspell program.
|
|
Use --check-url, if you want check URLs, e-mail addresses and paths.
|
|
|
|
* parsers/textparser.cxx: strip colon at end of words for Finnish
|
|
and Swedish (colon may be in words in Finnish and Swedish).
|
|
Problem reported by Lars Aronsson.
|
|
* tests/colons_in_words.*: test data
|
|
|
|
* tests/digits_in_words.*: example for using digits in words
|
|
(eg. 1-jährig, 112-jährig etc. in German), reported by Lars Aronsson.
|
|
|
|
* hashmgr.cxx: Hunspell accepts allcaps forms of mixed case
|
|
words of personal dictionaries (+allcaps custom dictionary words with
|
|
allcaps affixes).
|
|
Sf.net Bug ID 1755272, reported by Ellis Miller.
|
|
|
|
* hashmgr.cxx: fix small memory leaks with alias compressed
|
|
dictionaries (free flag vectors of affixed personal dictionary words
|
|
and flag vectors of hidden capitalized forms of mixed case and
|
|
allcaps words).
|
|
|
|
* affixmgr.cxx: fix COMPOUNDRULE checking with affixed compounds.
|
|
Sf.net Bug ID 1706659, reported by Björn Jacke. Also fixing for
|
|
OOo Issue 76067 (crash-like deceleration for hexadecimal numbers
|
|
with long FFFFFF sequence using en_US dictionary).
|
|
|
|
* tools/hunspell.cxx: add missing return to save_privdic().
|
|
|
|
* man/hunspell.4: add information about affixation of personal words:
|
|
"Personal dictionaries are simple word lists, but with optional
|
|
word patterns for affixation, separated by a slash:
|
|
|
|
foo
|
|
Foo/Simpson
|
|
|
|
In this example, "foo" and "Foo" are personal words, plus Foo
|
|
will be recognised with affixes of Simpson (Foo's etc.)."
|
|
|
|
2007-07-18 Németh László <nemeth at OOo>:
|
|
* src/win_api/: add missing resource files, reported by Ingo H. De Boer.
|
|
|
|
2007-07-16 Németh László <nemeth at OOo>:
|
|
* hunspell.cxx: fix dot removing from UTF-8 encoded words in cleanword2()
|
|
(Capitalised words with dots, as "Something." were not recognised
|
|
using Unicode encoded dictionaries.)
|
|
* tests/{base.*,base_utf.*}: extended and new test files for
|
|
dot removing and Unicode support.
|
|
|
|
* tools/hunspell.cxx: fix Cygwin, OS X compatibility using platform
|
|
specifics iconv() header by ICONV_CONST macro of Autoconf.
|
|
Sf.net Bug ID 1746030, reported by Mike Tian-Jian Jiang.
|
|
Sf.net Bug ID 1753939, reported by Jean-Christophe Helary.
|
|
|
|
* tools/hunspell.cxx: fix missing global path setting with -d option.
|
|
|
|
* tests/test.sh: fix broken Valgrind checking (missing warnings
|
|
with VALGRIND=memcheck make check).
|
|
|
|
* csutil.cxx: fix condition in u8_u16() to avoid invalid read
|
|
of not null-terminated character arrays (detected by Valgrind
|
|
in Hunspell executable: associated with 8-bit character table
|
|
conversion in tools/hunspell.cxx).
|
|
|
|
* csutil.cxx: free_utf_tbl(): use utf_tbl_count-- instead of utf_tbl--.
|
|
Memory leak in Hunspell executable detected by Valgrind.
|
|
|
|
* hashmgr.cxx: add missing free_utf_tbl(), memory leak in Hunspell
|
|
executable detected by Valgrind.
|
|
|
|
* hashmgr.cxx: load_tables(): fix memory error in spec. capitalization.
|
|
Use sizeof(unsigned short) instead of bad sizeof(unsigned short*).
|
|
Invalid memory read detected by Valgrind.
|
|
|
|
* hashmgr.cxx: add_word(): fix memory error in spec. capitalization.
|
|
Update also affix array length of capitalized homonyms. Invalid
|
|
memory read detected by Valgrind.
|
|
|
|
* hunspell.cxx: suggest(): fix invalid memory write and leak.
|
|
Bad realloc() and missing free() detected by Valgrind associated
|
|
with suggestions for "something.The" type spelling errors.
|
|
|
|
* {dictmgr,csutil,hashmgr,suggestmgr}.cxx: check memory allocation.
|
|
Sf.net Bug ID 1747507, based on the patch by Jose da Silva.
|
|
|
|
2007-07-13 Ingo H. De Boer <idb_winshell at SF.net>:
|
|
* atypes.cxx: fix Visual C compatibility: Using
|
|
"HUNSPELL_WARNING(a,b,...} {}" macro instead of empty "X(a,b...)".
|
|
|
|
* hunspell.cxx: changes for Windows API.
|
|
* win_api/Hunspell.*: new resource files
|
|
* win_api/hunspelldll.*: set optional Hunspell and Borland spec. codes
|
|
Sf.net Bug ID 1753802, patch by Ingo H. de Boer.
|
|
See also Sf.net Bug ID 1751406, patch by Mike Tian-Jian Jiang.
|
|
|
|
2007-07-09 Caolan McNamara <cmc at OO.o>:
|
|
* {hunspell,hashmgr,affentry}.cxx: fix warnings of Coverity program
|
|
analyzer. Sf.net Bug ID, 1750219.
|
|
|
|
2007-07-06 Németh László <nemeth at OOo>:
|
|
* atypes.cxx: warning-free swallowing of conditional warning messages
|
|
and their parameters using empty HUNSPELL_WARNING(a,b...) macro.
|
|
* {affixmgr,atypes,csutil}.cxx: fix unused variable warnings
|
|
using WARNVAR macro for conditionally named variables.
|
|
* hashmgr.cxx: fix unused variable warning in add_word() by cond. name
|
|
* hunspell.cxx: fix shadowed declaration of captype var. in suggest()
|
|
|
|
2006-06-29 Caolan McNamara <cmc at OO.o>:
|
|
* hunspell.cxx: patch to fix possible memory leak in analyze() of
|
|
experimental morphological analyzer code. Sf.net Bug ID 1745263.
|
|
|
|
2007-06-29 Németh László <nemeth at OOo>:
|
|
improvements:
|
|
* src/hunspell/hunspell.cxx: check bad capitalisation of Dutch letter IJ.
|
|
- Sf.net Feature Request ID 1640985, reported by Frank Fesevur.
|
|
- Solution: FORBIDDENWORD for capitalised word forms (need
|
|
an improved Dutch dictionary with forbidden words: Ijs/*, etc.).
|
|
* tests/IJ.*: test data and example.
|
|
|
|
* hashmgr.cxx, hunspell.cxx: check capitalization of special word forms
|
|
- words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG
|
|
Sf.net Bug ID 1398550, reported by Dmitri Gabinski.
|
|
- allcap words and suffixes: UNICEF's - UNICEF'S
|
|
- prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA
|
|
For Catalan, French and Italian languages.
|
|
Reported by Davide Prina in OOo Issue 68568.
|
|
* tests/allcaps*: tests for OPENOFFICE.ORG, UNICEF'S capitalization.
|
|
* tests/i68568*: tests for SANT'ELIA capitalization.
|
|
|
|
* hunspell/hunspell.cxx: suggestion for missing sentence spacing:
|
|
something.The -> something. The
|
|
|
|
* tools/hunspell.cxx: multiple character encoding support
|
|
- -i option: custom input encoding
|
|
Sf.net Bug ID 1610866, reported by Thobias Schlemmer.
|
|
Sf.net Bug ID 1633413, reported by Dan Kenigsberg.
|
|
See also hunspell-1.1.5-encoding.patch of Fedora from Caolan Mc'Namara.
|
|
* tests/*.test: add input encodings
|
|
|
|
* tools/hunspell.cxx: use locale data for default dictionary names.
|
|
Sf.net Bug ID 1731630, report and patch from Bernhard Rosenkraenzer,
|
|
See also hunspell-1.1.4-defaultdictfromlang.patch of Fedora Linux
|
|
from Caolan McNamara.
|
|
|
|
* tools/hunspell.cxx: fix 8-bit tokenization (letters without
|
|
casing, like ß or Hebrew characters now are handled well)
|
|
|
|
* tools/hunspell.cxx: dictionary search path
|
|
- DICPATH environmental variable
|
|
- -D option: show directory path of loaded dictionary
|
|
- automatic detection of OpenOffice.org directories
|
|
|
|
fixes:
|
|
* affixmgr.cxx: fault-tolerant patch for REP and other affix
|
|
table data problems. Problem with Hunspell and en_GB dictionary
|
|
reported by Thomas Lange in OOo Issue 76098 and
|
|
Stephan Bergmann in OOo Issue 76100.
|
|
Sf.net Bug ID 1698240, reported by Ingo H. de Boer.
|
|
|
|
* csutil.cxx: fix mkallcap_utf() for allcaps suggestion in UTF-8.
|
|
|
|
* suggestmgr.cxx: fix bad movechar_utf() (missing strlen()).
|
|
|
|
* hunspell.cxx: fix bad degree sign detection in Unicode
|
|
hu_HU environment.
|
|
|
|
* hunspell/hunspell.cxx: free allocated memory of csconv in
|
|
ported Mozilla code.
|
|
- Mozilla Bugzilla Bug 383564, report and Mozilla MySpell patch
|
|
by Andrew Geul. Reported by Ryan VanderMeulen for Hunspell.
|
|
|
|
* suggestmgr.cxx: fix minor difference in Unicode suggestion
|
|
(ngram suggestion of allcaps words in Unicode).
|
|
|
|
* hashmgr.cxx: close file handle after errors.
|
|
Sf.net Bug ID 1736286, reported by John Nisly.
|
|
|
|
* configure.ac: syntax error (shell variable with spaces).
|
|
Sf.net Bug ID 1731625, reported by Bernhard Rosenkraenzer.
|
|
|
|
* hunspell.cxx: check_word(): fix bad usage of info pointer.
|
|
|
|
* hashmgr.cxx: fix de_DE related bug (accept words with leading dash).
|
|
Sf.net Bug ID 1696134, reported by Björn Jacke.
|
|
|
|
* suggestmgr.cxx, tests/1695964.*: fix NEEDAFFIX homonym suggestion.
|
|
Sf.net Bug ID 1695964, reported by Björn Jacke.
|
|
|
|
* tests/1463589*: capitalized ngram suggestion test data for
|
|
Sf.net Bug ID 1463589, reported by Frederik Fouvry.
|
|
|
|
* csutil.cxx, affixmgr.cxx: fix possible heap error with
|
|
multiple instances of utf_tbl.
|
|
Sf.net Bug ID 1693875, reported by Ingo H. de Boer.
|
|
|
|
* affixmgr.cxx, suggestmgr.cxx, license.hunspell: convert to ASCII.
|
|
Locale dependent compiling problems. Sf.net Bug ID 1694379, reported
|
|
by Mike Tian-Jian Jiang. OOo Issue 78018 reported by Thomas Lange.
|
|
|
|
* tests/test.sh: compatibility issues
|
|
- fix Valgrind support (check shared library instead of shell wrapper)
|
|
- remove deprecated "tail +2" syntax
|
|
- set 8-bit locale for testing (LC_ALL=C)
|
|
|
|
* hunspell.hxx: remove license.* and config.h dependencies.
|
|
- hunspell-1.1.5-badheader.patch from Caolan McNamara <cmc at OO.o>
|
|
|
|
2007-03-21 Németh László <nemeth at OOo>:
|
|
* tools/Makefile.am, munch.h, unmunch.h: add missing munch.h and unmunch.h
|
|
Reported by Björn Jacke and Khaled Hosny (sf.net Bug ID 1684144)
|
|
* hunspell/hunspell.cxx, hunspell.hxx: fix --with-ui compliling error (add get_csconv())
|
|
Reported by Khaled Hosny (sf.net Bug ID 1685010)
|
|
|
|
2007-03-19 Németh László <nemeth at OOo>:
|
|
* csutil.cxx, hunspell/hunspell.cxx: Unicode non BMP area (>65K character range) support
|
|
(except conditional patterns and strip characters of affix rules)
|
|
* tests/utf8_nonbmp*: test data
|
|
|
|
* src/hunspell/*: add Mozilla patches from David Einstein
|
|
- run-time generated 8-bit character tables
|
|
- other Mozilla related changes (see Mozilla Bugzilla Bug 319778)
|
|
|
|
* csutil.cxx, affixmgr.cxx, hashmgr.cxx: optimized version of IGNORE feature
|
|
- IGNORE works with affixes (except strip characters and affix conditions)
|
|
* tests/ignore*: test data with latin characters
|
|
* tests/ignoreutf*: Unicode test data with Arabic diacritics (Harakat)
|
|
|
|
* src/hunspell/suggestmgr.cxx: new edit distance suggestion methods
|
|
- capitalization: nasa -> NASA
|
|
- long swap: permenant -> permanent
|
|
- long mov.: Ghandi -> Gandhi
|
|
- double two characters: vacacation -> vacation
|
|
* tests/sug.*: test data
|
|
|
|
* src/hunspell/affixmgr.cxx: space in REP strings (alot -> a lot)
|
|
Note: Underline character signs the space in REP strings: REP alot a_lot, and
|
|
put the expression with space ("a lot") into the dic file (see tests/sug).
|
|
|
|
* hashmgr.cxx, affixmgr.cxx: ignore Unicode byte order mark (BOM sequence)
|
|
* tests/utf8_bom*: test data
|
|
|
|
* hunspell/*.cxx: OOo Issue 68903 - Make lingucomponent warning-free on wntmsci10
|
|
- fix Hunspell related warning messages on Windows platform (except some assignment
|
|
within conditional expressions). Reported and started by Stephan Bergmann.
|
|
|
|
* hunspell/affixmgr.cxx: fix OOo Issue 66683 - hunspell dmake debug=x fails
|
|
- Reported by Stephan Bergmann.
|
|
|
|
* src/hunspell/hunspell.[ch]xx: thread safe API for Hunspell executable
|
|
(removing prev*() functions, new spell(word, info, root) function)
|
|
|
|
* configure.ac, src/hunspell/*: HUNSPELL_EXPERIMENTAL code
|
|
--with-experimental configure option (conditional compiling of morphological analyser
|
|
and stemmer tools)
|
|
|
|
* configure.ac, src/hunspell/*: conditional Hunspell warning messages
|
|
--with-warnings configure option
|
|
|
|
* affixmgr.cxx: new, optimized parsing functions
|
|
|
|
* affixmgr.cxx: fix homonym handling for German dictionary project,
|
|
reported by Björn Jacke (sf.net Bug ID 1592880).
|
|
* tests/1592880.*: test data by Björn Jacke
|
|
|
|
* src/hunspell/affixmgr.cxx: fix CIRCUMFIX suggestion
|
|
Bug reported by Erdal Ronahi.
|
|
|
|
* hunspell.cxx: reverse root word output (complex prefixes)
|
|
Bug reported by Munzir Taha.
|
|
|
|
* tools/hunspell.cxx: fix Emacs compatibility, patch by marot at sf.net
|
|
- no % command in PIPE mode (SourceForge BugTracker 1595607)
|
|
- fix HUNSPELL_VERSION string
|
|
|
|
* suggestmgr.[hc]xx: rename check() functions to checkword() (OOo Issue 68296)
|
|
adopt MySpell patch by Bryan Petty (tierra at ooo) for Hunspell source
|
|
|
|
* csutil.cxx, munch.c, unmunch.c: adopt relevant parts of the MinGW patch
|
|
(OOo Issue 42504) by tonal at ooo
|
|
|
|
* affigmgr.cxx: remove double candidate_check() call, reported by Bram Moolenaar
|
|
|
|
* tests/test.sh: add LC_ALL="C" environment. Locale dependency of make check
|
|
reported by Gentoo project.
|
|
|
|
* src/tools/hunspell.cxx: UTF-8 highlighting fix for console UI
|
|
(not solved: breaking long UTF-8 lines)
|
|
|
|
* src/tools/unmunch.c: fix bad generation if strip is shorter than condition,
|
|
reported by Davide Prina
|
|
* src/tools/unmunch.h: increase 5000 -> 500000
|
|
|
|
* src/tools/hunspell.cxx: fix memory error in suggestion (uninitialized parameter),
|
|
Bug also reported by Björn Jacke in SourceForge Bug 1469957
|
|
|
|
* csutil.cxx, affixmgr.cxx: fix Caolan McNamara's patch for non OOo environment
|
|
|
|
2006-11-11 Caolan McNamara <cmc at OO.o>:
|
|
* csutil.cxx, affixmgr.cxx: UTF-8 table patch (OOo Issue 71449)
|
|
Description: memory optimization (OOo doesn't use the large UTF-8 table).
|
|
|
|
* Makefile.am: shared library patch (Sourceforge ID 1610756)
|
|
|
|
* hunspell.h, hunspell.cxx: C API patch (Sourceforge ID 1616353)
|
|
|
|
* hunspell.pc: pkgconfig patch (Sourceforge ID 1639128)
|
|
|
|
2006-10-17 Ryan Jones <at Mozilla Bugzilla>:
|
|
* affixmgr.cxx: missing fclose(affixlst) calls
|
|
Reported by <gavins at ooo> in OOo Issue 70408
|
|
|
|
2007-07-11 Taha Zerrouki <taha at gawab>:
|
|
* affixmgr.cxx, hunspell.cxx, hashmgr.cxx, csutil.cxx: IGNORE feature to remove
|
|
optional Arabic and other characters from input and dictionary words.
|
|
* src/hunspell/langnum.hxx: add Arabic language number, lang_ar=96
|
|
* tests/ignore.*: test data
|
|
|
|
2006-05-28 Miha Vrhovnik <mvrhov at users.sourceforge>:
|
|
* src/win_api/*: C API for Windows DLLs
|
|
- also Delphi text editor example (see on Hunspell Sourceforge page)
|
|
|
|
2006-05-18 Kevin F. Quinn <kevquinn at gentoo>:
|
|
* utf_info.cxx: struct -> static struct
|
|
Shared library patch also developed by Gentoo developers (Hanno Meyer-Thurow,
|
|
Diego Pettenò, Kevin F. Quinn)
|
|
|
|
2006-02-02 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/hunspell.cxx: suggest(): replace "fooBar" -> "foo bar" suggestions
|
|
with "fooBar" ->"foo Bar" (missing spaces are typical OCR bugs).
|
|
Bug reported by stowrob at OOo in Issue 58202.
|
|
* src/hunspell/suggestmgr.cxx: twowords(): permit 1-character words.
|
|
(restore MySpell's original behavior). Here: "aNew" -> "a New".
|
|
* tests/i58202.*: test data
|
|
|
|
* src/parsers/textparser.cxx: fix Unicode tokenization in is_wordchar()
|
|
(extra word characters (WORDCHARS) didn't work on big-endian platforms).
|
|
|
|
* src/hunspell/{csutil,affixmgr}.cxx: inline isSubset(), isRevSubset():
|
|
little speed optimalization for languages with rich morphology.
|
|
|
|
* src/tools/hunspell.cxx: fix bad --with-ui and --with-readline compiling
|
|
when (N)curses is missing. Reported by Daniel Naber.
|
|
|
|
2006-01-19 Tor Lillqvist <tml@novell.com>
|
|
* src/hunspell/csutil.cxx: mystrsep(): fix locale-dependent isspace() tokenization
|
|
|
|
2006-01-06 András Tímár <timar@fsf.hu>
|
|
* src/hunspell/{hashmgr.hxx,hunspell.cxx}: fix Visual C++ compiling errors
|
|
|
|
2006-01-05 Németh László <nemethl@gyorsposta.hu>:
|
|
* COPYING: set GPL/LGPL/MPL tri-license for Mozilla integration.
|
|
Rationale: Mozilla source code contains an old MySpell version
|
|
with GPL/LGPL/MPL tri-license. (MPL license is a copyleft license, similar
|
|
to the LGPL, but it acts on file level.)
|
|
* COPYING.LGPL: GNU Lesser General Public License 2.1 (LGPL)
|
|
* COPYING.MPL: Mozilla Public License 1.1 (MPL)
|
|
* license.hunspell, src/hunspell/license.hunspell: GPL/LGPL/MPL tri-license
|
|
|
|
* src/hunspell/{affixmgr,hashmgr}.*: AF, AM alias definitions in affix file:
|
|
compression of flag sets and morphological descriptions (see manual,
|
|
and tests/alias* test files).
|
|
Rationale: Alias compression is also good for loading time and memory
|
|
efficiency, not only smaller resources.
|
|
* src/tools/makealias: alias compression utility
|
|
(usage: ./makealias file.dic file.aff)
|
|
* tests/alias{,2,3}: AF, AM tests
|
|
* man/hunspell.4: add AF, AM documentation
|
|
* src/hunspell/affentry.cxx, atypes.hxx: add new opts bits (aeALIASM, aeALIASF)
|
|
|
|
* tools/hunspell, src/parser/*, src/hunspell/*: Hunspell program
|
|
tokenizes Unicode texts (only with UTF-8 encoded dictionaries).
|
|
Missing Unicode tokenization reported by Björn Jacke, Egmont Koblinger,
|
|
Jess Body and others.
|
|
Note: Curses interactive interface hasn't worked perfectly yet.
|
|
* tests/*.tests: remove -1 parameters of Hunspell
|
|
* tests/*.{good,wrong}: remove tabulators
|
|
|
|
* src/hunspell/{hunspell,affixmgr}.cxx: BREAK option: break words at
|
|
specified break points and checking word parts separately (see manual).
|
|
Note: COMPOUNDRULE is better (or will be better) for handling dashes and
|
|
other compound joining characters or character strings. Use BREAK, if you
|
|
want check words with dashes or other joining characters and there is no time
|
|
or possibility to describe precise compound rules with COMPOUNDRULE.
|
|
* tests/break.*: BREAK example.
|
|
|
|
* src/hunspell/{affixmgr,hunspell}.cxx: add CHECKSHARPS declaration instead
|
|
of LANG de_DE definitions to handle German sharp s in both spelling and
|
|
suggestion.
|
|
* src/hunspell/hunspell.cxx: With CHECKSHARPS, uppercase words are valid
|
|
with both lower sharp s (it's is optional for names in German legal texts)
|
|
and SS (MÜßIG, MÜSSIG). Missing lower sharp s form reported by Björn Jacke.
|
|
* src/hunspell/hunspell.cxx: KEEPCASE flag on a sharp s word has a special
|
|
meaning with CHECKSHARPS declaration: KEEPCASE permits capitalisation and SS upper
|
|
casing of a sharp s word (Müßig and MÜSSIG), but forbids the upper cased form
|
|
with lower sharp s character(s): *MÜßIG.
|
|
* tests/germancompounding*: add CHECKSHARPS, remove LANG
|
|
* tests/checksharps*: add CHECKSHARPS and KEEPCASE, remove LANG
|
|
|
|
* src/hunspell/hunspell.cxx: improved suggestions:
|
|
- suggestions for pressed Caps Lock problems: macARONI -> macaroni
|
|
- suggestions for long shift problems: MAcaroni -> Macaroni, macaroni
|
|
- suggestions for KEEPCASE words: KG -> kg
|
|
* src/hunspell/csutil.cxx: fix mystrrep() function:
|
|
- suggestions for lower sharp s in uppercased words: MÜßIG -> MÜSSIG
|
|
* tests/checksharps{,utf}.sug: add tests for mystrrep() fix
|
|
|
|
* src/hunspell/hashmgr.cxx: Now dictionary words can contain slashes
|
|
with the "\/" syntax. Problem reported by Frederik Fouvry.
|
|
|
|
* src/hunspell/hunspell.cxx: fix bad duplicate filter in suggest().
|
|
(Suggesting some capitalised compound words caused program crash
|
|
with Hungarian dictionary, OOo Issue 59055).
|
|
|
|
* src/hunspell/affixmgr.cxx: fix bad defcpd_check() call in compound_check().
|
|
(Overlapping new COMPOUNDRULE and old compounding methods caused program
|
|
crash at suggestion.)
|
|
|
|
* src/hunspell/affixmgr.{cxx,hxx}: check affix flag duplication at affix classes.
|
|
Suggested by Daniel Naber.
|
|
|
|
* src/hunspell/affentry.cxx: remove unused variable declarations (OOo i58338).
|
|
Compiler warnings reported by András Tímár and Martin Hollmichel.
|
|
|
|
* src/hunspell/hunspell.cxx: morph(): not analyse bad mixed uppercased forms
|
|
(fix Arabic morphological analysis with Buckwalter's Arabic transliteration)
|
|
|
|
* src/hunspell/affentry.{cxx,hxx}, atypes.hxx: little memory optimization
|
|
in affentry:
|
|
- using unsigned char fields instead of short (stripl, appndl, numconds)
|
|
- rename xpflg field to opts
|
|
- removing utf8 field, use aeUTF8 bit of opts field
|
|
|
|
* configure.ac: set tests/maputf.test to XFAILED on ARM platform.
|
|
Fail reported by Rene Engelhard.
|
|
|
|
* configure.ac: link Ncursesw library, if exists.
|
|
|
|
* BUGS: add BUGS file
|
|
|
|
* tests/complexprefixes2.*: test for morphological analysis with COMPLEXPREFIXES
|
|
|
|
* src/hunspell/affixmgr.cxx: use "COMPOUNDRULE" instead of
|
|
"COMPOUND". The new name suggested by Bram Moolenaar.
|
|
* tests/compoundrule*: modified and renamed compound.* test files
|
|
|
|
* man/hunspell.4: AF, AM, BREAK, CHECKSHARPS, COMPOUNDRULE, KEEPCASE.
|
|
- also new addition to the documentation:
|
|
Header of the dictionary file define approximate dictionary size:
|
|
``A dictionary file (*.dic) contains a list of words, one per line.
|
|
The first line of the dictionaries (except personal dictionaries)
|
|
contains the _approximate_ word count (for optimal hash memory size).''
|
|
Asked by Frederik Foudry.
|
|
|
|
One-character replacements in REP definitions: ``It's very useful to
|
|
define replacements for the most typical one-character mistakes, too:
|
|
with REP you can add higher priority to a subset of the TRY suggestions
|
|
(suggestion list begins with the REP suggestions).''
|
|
|
|
2005-11-11 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/affixmgr.*: fix Unicode MAP errors (sorted only n-1
|
|
characters instead of n ones in UTF-16 MAP character lists).
|
|
Bug reported by Rene Engelhard.
|
|
|
|
* src/hunspell/affixmgr.*: fix infinite COMPOUND matching (default char
|
|
type is unsigned on PowerPC, s390 and ARM platforms and it will never
|
|
be negative). Bug reported by Rene Engelhard.
|
|
|
|
* src/hunspell/{affixmgr,suggestmgr}.cxx: fix bad ONLYINCOMPOUND
|
|
word suggestions.
|
|
* tests/onlyincompound.sug: empty test file to check this fix.
|
|
Bug reported by Björn Jacke.
|
|
|
|
* src/hunspell/affixmgr.cxx: fix backtracking in COMPOUND pattern matching.
|
|
* tests/compound6.*: test files to check this fix.
|
|
|
|
* csutil.cxx: set bigger range types in flag_qsort() and flag_bsearch().
|
|
|
|
* affixmgr.hxx: set better type for cont_classes[] Boolean data (short -> char)
|
|
|
|
* configure.ac, tests/automake.am: set platform specific XFAIL test
|
|
(flagutf8.test on ARM platform)
|
|
|
|
2005-11-09 Németh László <nemethl@gyorsposta.hu>:
|
|
improvements:
|
|
* src/hunspell/affixmgr.*: new and improved affix file parameters:
|
|
|
|
- COMPOUND definitions: compound patterns with regexp-like matching.
|
|
See manual and test files: tests/compound*.*
|
|
Suggested by Bram Moolenaar.
|
|
Also useful for simple word-level lexical scanning, for example
|
|
analysing numbers or words with numbers (OOo Issue #53643):
|
|
http://qa.openoffice.org/issues/show_bug.cgi?id=53643
|
|
Examples: tests/compound{4,5}.*.
|
|
|
|
- NOSUGGEST flag: words signed with NOSUGGEST flag are not suggested.
|
|
Proposed flag for vulgar and obscene words (OOo Issue #55498).
|
|
Example: tests/nosuggest.*.
|
|
Problem reported by bobharvey at OOo:
|
|
http://qa.openoffice.org/issues/show_bug.cgi?id=55498
|
|
|
|
- KEEPCASE flag: Forbid capitalized and uppercased forms of words
|
|
signed with KEEPCASE flags. Useful for special ortographies
|
|
(measurements and currency often keep their case in uppercased
|
|
texts) and other writing systems (eg. keeping lower case of IPA
|
|
characters).
|
|
|
|
- CHECKCOMPOUNDCASE: Forbid upper case characters at word bound in compounds.
|
|
Examples: tests/checkcompoundcase* and tests/germancompounding.*
|
|
|
|
- FLAG UTF-8: New flag type: Unicode character encoded with UTF-8.
|
|
Example: tests/flagutf8.*.
|
|
Rationale: Unicode character type can be more readable
|
|
(in a Unicode text editor) than `long' or `num' flag type.
|
|
|
|
bug fixes:
|
|
* src/hunspell/hunspell.cxx: accept numbers and numbers with separators (i53643)
|
|
Bug reported by skelet at OOo:
|
|
http://qa.openoffice.org/issues/show_bug.cgi?id=53643
|
|
|
|
* src/hunspell/csutil.cxx: fix casing data in ISO 8859-13 character table.
|
|
|
|
* src/hunspell/csutil.cxx: add ISO-8859-15 character encoding (i54980)
|
|
Rationale: ISO-8859-15 is the default encoding of the French OpenOffice.org
|
|
dictionary. ISO-8859-15 is a modified version of ISO-8859-1
|
|
(latin-1) character encoding with French œ ligatures and euro
|
|
symbol. Problem reported by cbrunet at OOo in OOo Issue 54980:
|
|
http://qa.openoffice.org/issues/show_bug.cgi?id=54980
|
|
|
|
* src/hunspell/affixmgr.cxx: fix zero-byte malloc after a bad affix header.
|
|
Patch by Harri Pitkänen.
|
|
|
|
* src/hunspell/suggestmgr.cxx: fix bad NEEDAFFIX word suggestion
|
|
in ngram suggestions. Reported by Daniel Naber and Friedel Wolff.
|
|
|
|
* src/hunspell/hashmgr.cxx: fix bad white space checking in affix files.
|
|
src/hunspell/{csutil,affixmgr}.cxx: add other white space separators.
|
|
Problems with tabulators reported by Frederik Fouvry.
|
|
|
|
* src/hunspell/*: replace system-dependent <license.*> #include
|
|
parameters with quoted ones. Problem reported by Dafydd Jones.
|
|
|
|
* src/hunspell/hunspell.cxx: fix missing morphological analysis of dot(s)
|
|
Reported by Trón Viktor.
|
|
|
|
changes:
|
|
* src/hunspell/affixmgr.cxx: rename PSEUDOROOT to NEEDAFFIX.
|
|
Suggested by Bram Moolenaar.
|
|
|
|
* src/hunspell/suggestmgr.hxx: Increase default maximum of
|
|
ngram suggestions (3->5). Suggested by Kevin Hendricks.
|
|
|
|
* src/hunspell/htypes.hxx: Increase MAXDELEN for long affix flags.
|
|
|
|
* src/hunspell/suggestmgr.cxx: modify (perhaps fix) Unicode map suggestion.
|
|
tests/maputf test fail on ARM platform reported by Rene Engelhard.
|
|
|
|
* src/hunspell/{affentry.cxx,atypes.hxx}: remove [PREFIX] and
|
|
MISSING_DESCRIPTION messages from morphological analysis.
|
|
Problems reported by Trón Viktor.
|
|
|
|
* tests/germancompounding.{aff,good}: Add "Computer-Arbeit" test word.
|
|
Suggested by Daniel Naber.
|
|
|
|
* doc/man/hunspell.4: Proof-reading patch by Goldman Eleonóra.
|
|
|
|
* doc/man/hunspell.4: Fix bad affix example (replace `move' with `work').
|
|
Bug reported by Frederik Fouvry.
|
|
|
|
* tests/*: new test files:
|
|
affixes.*: simple affix compression example from Hunspell 4 manual page
|
|
checkcompoundcase.*, checkcompoundcase2.*, checkcompoundcaseutf.*
|
|
compound.*, compound2.*, compound3.*, compound4.*, compound5.*
|
|
compoundflag.* (former compound.*)
|
|
flagutf8.*: test for FLAG UTF-8
|
|
germancompounding.*: simplification with CHECKCOMPOUNDCASE.
|
|
germancompoundingold.* (former germancompounding.*)
|
|
i53643.*: check numbers with separators
|
|
i54980.*: ISO8859-15 test
|
|
keepcase.*: test for KEEPCASE
|
|
needaffix*.* (former pseudoroot*.* tests)
|
|
nosuggest.*: test for NOSUGGEST
|
|
|
|
2005-09-19 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/suggestmgr.cxx: improved ngram suggestion:
|
|
- detect not neighboring swap characters (pernament -> permanent)
|
|
Rationale: ngram method has a significant error with not neighboring
|
|
swap characters, especially when swap is in the middle of the word.
|
|
- suggest uppercase forms (unesco -> UNESCO, siggraph's -> SIGGRAPH's)
|
|
- suggest only ngram swap character and uppercase form, if they exist.
|
|
Rationale: swap character and casing equivalence give mutch better
|
|
suggestions as any other (weighted) ngram suggestions.
|
|
- add uppercase suggestion (PERMENANT -> PERMANENT)
|
|
|
|
* src/hunspell/*: complete comparison with MySpell 3.2 (in OOo beta 2):
|
|
- affixmgr.cxx: add missing numrep initialization
|
|
- hashmgr.cxx: add_word(): don't allocate temporary records
|
|
- hunspell.cxx: in suggest():
|
|
- check capitalized words first (better sug. order for proper names),
|
|
- check pSMgr->suggest() return value
|
|
- set pSMgr->suggest() call to not optional in HUHCAP
|
|
- csutil.cxx: fix bad KOI8-U -> koi8r_tbl reference in enc_entry encds
|
|
- csutil.cxx: fix casing data in ISO 8859-2, Windows 1251 and KOI8-U
|
|
encoding tables. Bug reported by Dmitri Gabinski.
|
|
|
|
* src/hunspell/affixmgr.*: improved compound word and other features
|
|
- generalize hu_HU specific compound word features with new affix file
|
|
parameters, suggested by Bram Moolenaar:
|
|
- CHECKCOMPOUNDDUP: forbid word duplication in compounds (eg. foo|foo)
|
|
- CHECKCOMPOUNDTRIPLE: forbid triple letters in compounds (eg. foo|obar)
|
|
- CHECKCOMPOUNDPATTERN: forbid patterns at word bounds in compounds
|
|
- CHECKCOMPOUNDREP: using REP replacement table, forbid presumably bad
|
|
compounds (useful for languages with unlimited number of compounds)
|
|
- ONLYINCOMPOUND flag works also with words (see tests/onlyincompound.*)
|
|
Suggested by Daniel Naber, Björn Jacke, Trón Viktor & Bram Moolenaar.
|
|
- PSEUDOROOT works also with prefixes and prefix + suffix combinations
|
|
(see tests/pseudoroot5.*). Suggested by Trón Viktor.
|
|
- man/hunspell.4: updated man page
|
|
|
|
* src/hunspell/affixmgr.*: fix incomplete prefix handling with twofold
|
|
suffixes (delete unnecessary contclasses[] conditions in
|
|
prefix_check_twosfx() and prefix_check_twosfx_morph()).
|
|
Bug reported by Trón Viktor.
|
|
|
|
* src/hunspell/affixmgr.*: complete also *_morph() functions with
|
|
conditions of new Hunspell features (circumfix, pseudoroot etc.).
|
|
|
|
* src/hunspell/suggestmgr.cxx:
|
|
- fix missing suggestions for words with crossed prefix and suffix
|
|
- fix redundant non compound word checking
|
|
- fix losing suggestions problem. Bug reported by Dmitri Gabinski.
|
|
|
|
* src/hunspell/dictmgr.*:
|
|
- add new dictionary manager for Hunspell UNO modul
|
|
Problems with eo_ANY Esperanto locale reported by Dmitri Gabinski.
|
|
|
|
* src/hunspell/*: use precise constant sizes for 8-bit and 16-bit character
|
|
arrays with MAXWORDUTF8LEN and MAXSWUTF8L macros.
|
|
|
|
* src/hunspell/affixmgr.cxx: fix bad MAXNGRAMSUGS parameter handling
|
|
|
|
* src/hunspell/affixmgr.cxx, src/tools/{un}munch.*: fix GCC 4.0 warnings
|
|
on fgets(), reported by Dvornik László
|
|
|
|
* po/hu.po: improved translation by Dvornik László
|
|
|
|
* tests/test.sh: improved test environment
|
|
- add suggestion testing (see tests/*.sug)
|
|
- add memory debugging environment, based on the excellent Valgrind debugger.
|
|
Usage on Linux and experimental platforms of Valgrind:
|
|
VALGRIND=memcheck make check
|
|
- rename test_hunmorph to test.sh
|
|
|
|
* tests/*: new tests:
|
|
- base.*: base example based on MySpell's checkme.lst.
|
|
- map{,utf}.*, rep{,utf}: MAP and REP suggestion examples
|
|
- tests on new CHECKCOMPOUND, ONLYINCOMPOUND and PSEUDOROOT features
|
|
- i54633.*: capitalized suggestion test for Issue 54633 from OOo's Issuezilla
|
|
- i35725.*: improved ngram suggestion test for Issue 35725
|
|
|
|
2005-08-26 Németh László <nemethl@gyorsposta.hu>:
|
|
improvements:
|
|
|
|
* src/hunspell/suggestmgr.cxx:
|
|
Unicode support in related character map suggestion
|
|
|
|
* src/hunspell/suggestmgr.cxx: Unicode support in ngram suggestion
|
|
|
|
* src/hunspell/{suggestmgr,affixmgr,hunspell}.cxx: improve ngram suggestion.
|
|
Fix http://qa.openoffice.org/issues/show_bug.cgi?id=35725. See release
|
|
notes for examples. This problem reported by beccablain at OOo.
|
|
- ngram suggestions now are case insensitive (see `Permenant' bug in Issuezilla)
|
|
- weight ngram suggestions (with the longest common subsequent algorithm,
|
|
also considering lengths of bad word and suggestion, identical first
|
|
letters and almost completely identical character positions)
|
|
- set strict affix congruency in expand_rootword(). Now ngram suggestions
|
|
are good for languages with rich morphology and also better for English.
|
|
Rationale: affixed forms of the first ngram suggestion
|
|
very often suppress the second and subsequent root word suggestions. But
|
|
faults in affixes are more uncommon, and can be fix without suggestions.
|
|
We must prefer the more informative second and subsequent root word
|
|
suggestions instead of the suggestions for bad affixes.
|
|
- a better suggestion may not be substring of a less good suggestion
|
|
Rationale: Suggesting affixed forms of a root word is
|
|
unnecessary, when root word has got better weighted ngram value.
|
|
(Checking substrings is a good approximation for this refinement.)
|
|
- lesser ngram suggestions (default 3 maximum instead of 10)
|
|
Rationale: For users need a big extra effort to check a lot of bad ngram
|
|
suggestions, nine times out of ten unnecessarily. It is very
|
|
distracting, because ngram suggestions could be very different.
|
|
Usually Myspell and Hunspell suggest one or two suggestions with
|
|
the old suggestion algorithms (maximum is 15), with ngram algorithm
|
|
often gives maximum number suggestions. With strict affix congruency
|
|
and other refinements, the good suggestion there is usually among the
|
|
first three elements.
|
|
- new affix parameter: MAXNGRAMSUG
|
|
|
|
* src/hunspell/*: support agglutinative languages with rich prefix
|
|
morphology or with right-to-left writing system (for example, Turkic
|
|
and Austronesian languages with (modified) Arabic scripts).
|
|
- new affix parameter: COMPLEXPREFIXES
|
|
Set twofold prefix stripping (but single suffix stripping)
|
|
* src/hunspell/affixmgr.cxx:
|
|
- speed up prefix loading with tree sorting algorithm.
|
|
* tests/complexprefixes.*, tests/complexprefixesutf.*:
|
|
Coptic example posted by Moheb Mekhaiel
|
|
|
|
* src/hunspell/hashmgr.cxx: check size attribute in dic file
|
|
suggested by Daniel Naber
|
|
Rationale: With missing size attribute Hunspell allocates too small and
|
|
more slower hash memory, and Hunspell can lose first dictionary word.
|
|
|
|
* src/hunspell/affixmgr.cxx: check stripping characters and condition
|
|
compatibility in affix rules (bugs detected in cs_CZ, es_ES, es_NEW,
|
|
es_MX, lt_LT, nn_NO, pt_PT, ro_RO and sk_SK dictionaries). See release
|
|
notes of Hunspell 1.0.9 in NEWS.
|
|
|
|
* src/hunspell/affixmgr.cxx: check unnecessary fields in affix rules
|
|
(bugs detected in ro_RO and sv_SE dictionaries). See release notes.
|
|
|
|
* src/hunspell/affixmgr.cxx: remove redundant condition checking
|
|
in affix rules with stripping characters (redundancy in OpenOffice.org
|
|
dictionaries reported by Eleonóra Goldman)
|
|
Rationale: this is a little optimization, but it was excellent for
|
|
detect the bad ngram affixation with bad or weak affix conditions.
|
|
|
|
* tests/germancompounding.aff: improve compound definition
|
|
- use dash prefix instead of language specific tokenizer
|
|
Rationale: Using uniform approach is the right way to check and analyze
|
|
compound words. Language specific word breaking is deprecated, need
|
|
a sophisticated grammar checking for word-like word pairs
|
|
(for example in Hungarian there is a substandard, but accepted
|
|
syntax with dash for word pairs: cats, dogs -> kutyák-macskák (like
|
|
cats/dogs in English).
|
|
|
|
* test Hunspell with 54 OpenOffice.org dictionaries: see release notes
|
|
|
|
bug fixes:
|
|
|
|
* src/hunspell/suggestmgr.*: add time limit to exponential
|
|
algorithm of the related character map suggestion
|
|
Rationale: a long word in agglutinative languages or a special pattern
|
|
(for example a horizontal rule) made of map characters can `crash' the
|
|
spell checker.
|
|
|
|
* src/hunspell/affentry.cxx: add() functions: fix bad word generation
|
|
checking stripping characters (see similar bug in unmunch)
|
|
|
|
* src/hunspell/affixmgr.cxx: parse_file(): fix unconditional getNext()
|
|
call for ~AffixMgr() when affix file is corrupt.
|
|
|
|
* src/hunspell/affixmgr.*: AffixMgr(), parse_cpdsyllable(): fix missing
|
|
string duplications for ~AffixMgr() when affix file is corrupt.
|
|
|
|
* src/hunspell/affixmgr.*: parse_affix(): fix fprintf() call when affix
|
|
file is corrupt. Bug reported by Daniel Naber.
|
|
|
|
* suggestmgr.cxx: replace single usage of 'strdup' with 'mystrdup'
|
|
patch by Chris Halls (debian.org)
|
|
|
|
* src/hunspell/makefile.mk: add makefile.mk for compiling in OpenOffice.org
|
|
See README in Hunspell UNO modul.
|
|
Problems with separated compiling reported by Rene Engelhard
|
|
|
|
* src/hunspell/hunspell.cxx: fix pseudoroot support
|
|
- search a not pseudoroot homonym in check()
|
|
* tests/pseudoroot4.*: test this fix
|
|
|
|
* src/tools/unmunch.c: fix bad word generation when conditions
|
|
are shorter or incompatible with stripping characters in affix rules
|
|
|
|
* src/tools/unmunch.c: fix mychomp() for de_AT.dic and other dic files
|
|
without last new line character.
|
|
|
|
other changes:
|
|
* src/hunspell/suggestmgr.*: erase ACCENT suggestion
|
|
Rationale: ACCENT suggestion was the same as Kevin Hendrick's map
|
|
suggestion algorithm, but with a less good interface in affix file.
|
|
|
|
* src/hunspell/suggestmgr.*: combine cycle number limit
|
|
in badchar(), and forgotchar() with a time limit.
|
|
|
|
* src/hunspell/affixmgr.*: remove NOMAPSUGS affix parameter
|
|
|
|
* src/hunspell/{suggestmgr,hunspell}.*: strip periods from
|
|
suggestions (restore MySpell's original behaviour)
|
|
Rationale: OpenOffice.org has an automatic period handling mechanism
|
|
and suggestions look better without periods.
|
|
- new affix file parameter: SUGSWITHDOTS
|
|
Add period(s) to suggestions, if input word terminates in period(s).
|
|
(No need for OpenOffice.org dictionaries.)
|
|
|
|
* tests/germancompounding.aff: improve bad german affix in affix example
|
|
(computeren->computern). Suggested by Daniel Naber.
|
|
|
|
* src/tools/example.cxx: add Myspell's example
|
|
|
|
* src/tools/munch.cxx: add Myspell's munch
|
|
|
|
* man{,/hu}/hunspell.4: refresh manual pages
|
|
|
|
2005-08-01 Németh László <nemethl@gyorsposta.hu>:
|
|
* add missing MySpell files and features:
|
|
- add MySpell license.readme, README and CONTRIBUTORS ({license,README,AUTHORS}.myspell)
|
|
- add MySpell unmunch program (src/tools/unmunch.c)
|
|
- add licenses to source (src/hunspell/license.{myspell,hunspell})
|
|
- port MAP suggestion (with imperfect UTF-8 support)
|
|
- add NOSPLITSUGS affix parameter
|
|
- add NOMAPSUGS affix parameter
|
|
|
|
* src/man/man.4: MAP, COMPOUNDPERMITFLAG, NOSPLITSUGS, NOMAPSUGS
|
|
|
|
* src/hunspell/aff{entry,ixmgr}.cxx:
|
|
- improve compound word support
|
|
- new affix parameter: COMPOUNDPERMITFLAG (see manual)
|
|
* src/tests/compoundaffix{,2}.*: examples for COMPOUNDPERMITFLAG
|
|
* src/tests/germancompounding.*: new solution for German compounding
|
|
Problems with German compounding reported by Daniel Naber
|
|
|
|
* src/hunspell/hunspell.cxx: fix German uppercase word spelling
|
|
with the spellsharps() recursive algorithm.
|
|
Default recursive depth is 5 (MAXSHARPS).
|
|
* src/tests/germansharps*: extended German sharp s tests
|
|
|
|
* src/tools/hunspell.cxx: fix fatal memory bug in non-interactive
|
|
subshells without HOME environmental variable
|
|
Bug detected with PHP by András Izsók.
|
|
|
|
2005-07-22 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/csutil.hxx: utf16_u8()
|
|
- fix 3-byte UTF-8 character conversion
|
|
|
|
2005-07-21 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/hunspell/csutil.hxx: hunspell_version() for OOo UNO modul
|
|
|
|
2005-07-19 Németh László <nemethl@gyorsposta.hu>:
|
|
* renaming:
|
|
- src/morphbase -> src/hunspell
|
|
- src/hunspell, src/hunmorph -> src/tools
|
|
- src/huntokens -> src/parsers
|
|
|
|
* src/tools/hunstem.cxx: add stemmer example
|
|
|
|
2005-07-18 Németh László <nemethl@gyorsposta.hu>:
|
|
* configure.ac: --with-ui, --with-readline configure options
|
|
* src/hunspell/hunspell.cxx: fix conditional compiling
|
|
|
|
* src/hunspell/hunspell.cxx: set HunSPELL.bak temporaly file
|
|
in the same dictionary with the checked file.
|
|
|
|
* src/morphbase/morphbase.cxx:
|
|
|
|
- handling German sharp s (ß)
|
|
|
|
- fix (temporaly) analyize()
|
|
|
|
* tests: a lot of new tests
|
|
|
|
* po/, intl/, m4/: add gettext from GNU hello
|
|
|
|
* po/hu.po: add Hungarian translation
|
|
|
|
* doc/, man/: rename doc to man
|
|
|
|
2005-07-04 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/morphbase/hashmgr.cxx: set FLAG attributum instead of FLAG_NUM and FLAG_LONG
|
|
|
|
* doc/hunspell.4: manual in English
|
|
|
|
2005-06-30 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/morphbase/csutil.cxx: add character tables from csutil.cxx of OOo 1.1.4
|
|
|
|
* src/morphbase/affentry.cxx: fix Unicode condition checking
|
|
|
|
* tests/{,utf}compound.*: tests compounding
|
|
|
|
2005-06-27 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/morphbase/*: fix Unicode compound handling
|
|
|
|
2005-06-23 Halácsy Péter:
|
|
* src/hunmorph/hunmorph.cxx: delete spelling error message and suggest_auto() call
|
|
|
|
2005-06-21 Németh László <nemethl@gyorsposta.hu>:
|
|
* src/morphbase: Unicode support
|
|
* tests/utf8.*: SET UTF-8 test
|
|
|
|
* src/morphbase: checking and fixing with Valgrind
|
|
Memory handling error reported by Ferenc Szidarovszky
|
|
|
|
2005-05-26 Németh László <nemethl@gyorsposta.hu>:
|
|
* suggestmgr.cxx: fix stemming
|
|
* AUTHORS, COPYING, ChangeLog: set CC-LGPL free software license
|
|
|
|
2004-05-25 Varga Dániel <daniel@all.hu>
|
|
* src/stemtool: new subproject
|
|
|
|
2005-05-25 Halácsy Péter <peter@halacsy.com>
|
|
* AUTHORS, COPYING: set CC Attribution license
|
|
|
|
2004-05-23 Varga Dániel <daniel@all.hu>
|
|
* src: - modifications for compiling with Visual C++
|
|
|
|
* src/hunmorph/csutil.cxx: correcting header of flag_qsort(),
|
|
* src/hunmorph/*: correct csutil include
|
|
|
|
2005-05-19 Németh László <nemethl@gyorsposta.hu>
|
|
* csutil.cxx: fix loop condition in lineuniq()
|
|
bug reported by Viktor Nagy (nagyv nyelvtud hu).
|
|
|
|
* morphbase.cxx: handle PSEUDOROOT with zero affixes
|
|
bug reported by Viktor Nagy (nagyv nyelvtud hu).
|
|
* tests/zeroaffix.*: add zeroaffix tests
|
|
|
|
2005-04-09 Németh László <nemethl@gyorsposta.hu>
|
|
* config.h.in: reset with autoheader
|
|
|
|
* src/hunspell/hunspell.cxx: set version
|
|
|
|
2005-04-06 Németh László <nemethl@gyorsposta.hu>
|
|
* tests: tests
|
|
|
|
* src/morphbase:
|
|
New optional parameters in affix file:
|
|
- PSEUDOROOT: for forbidding root with not forbidden suffixed forms.
|
|
- COMPOUNDWORDMAX: max. words in compounds (default is no limit)
|
|
- COMPOUNDROOT: signs compounds in dictionary for handling special compound rules
|
|
- remove COMPOUNDWORD, ONLYROOT
|
|
|
|
2005-03-21 Németh László <nemethl@gyorsposta.hu>
|
|
* src/morphbase/*:
|
|
- 2-byte flags, FLAG_NUM, FLAG_LONG
|
|
- CIRCUMFIX: signed suffixes and prefixes can only occur together
|
|
- ONLYINCOMPOUND for fogemorpheme (Swedish, Danish) or Flute-elements (German)
|
|
- COMPOUNDBEGIN: allow signed roots, and roots with signed suffix in begin of compounds
|
|
- COMPOUNDMIDDLE: like before, but middle of compounds
|
|
- COMPOUNDEND: like before, but end of compounds
|
|
- remove COMPOUNDFIRST, COMPOUNDLAST
|