Nepali Spell Checker Documentation
This section provides an overview of the Nepali Spell Checker app with a description of major features and documentation of its main functionalities.Contents
Nepali Spell Checker makes it easy to spell-check in Nepali by providing relevant and accurate suggestions. Potential misspelled words that are not found in the dictionary are presented with relevant suggestions (powered by a comprehensive set of Nepali words and phrases.)
To spell-check, paste your Nepali document into the editor and click the "Check Spelling" button. Nepali Spell Checker will analyze your document and if it finds a misspelled term, either a word or a phrase, it will provide you with relevant suggestions. You can, then, either correct the misspelled term with one of the suggestions or ignore the suggestions and not make the change, or you can enter your own custom replacement for the term. You can either let the spell checker go through the misspelled terms sequentially or you can select any misspelled term in the document and resume spell-checking from that term.
While replacing the misspelled term with one of the suggestions, you can do so for just that particular occurrence of the misspelled term or have the change applied to all occurrences of the misspelled term. In a similar fashion, you can ignore a particular occurrence of the term or ignore all occurrences of the term.
How suggestions are made
Suggestions offered by Nepali Spell Checker are generated by an algorithm, which is based on a number of factors including the entries in the official Nepali dictionary, Nepali grammar and orthography rules and the guidelines from Nepal Academy. The order in which the suggestions are presented is based on objective factors, such as popularity of the suggestions.
The approach taken to predict suggestions can be categorized broadly into two steps. The first step includes the term being looked up against a comprehensive set of words and phrases, which includes a varieties of entries, new words not yet included in the dictionary, proper nouns, etc. If a term matches an entry in the set, the spell checker learns that the term is correct and moves on to the next term.
If the term does not exist in the dictionary, then, as a part of the second step, it processes the term against the grammar and orthography rules. Often times, a typical document contains many valid terms, like inflections and derived words, etc., that are not root terms. The spell checker decomposes such a term into different components. A lookup is performed for each component, and if the lookup succeeds, i.e., if all components exist in the set, it finally checks to see if all the components relate to each other according to the grammar and orthography rules.
The word "Gharma," meaning "in the house" and containing the root word, Ghar, meaning "house" and the case ending Ma, meaning "in," can be taken as an example to illustrate the approach. In the first step, a lookup is made for the entire term. Even though the term is valid, the lookup fails because the entire term does not exist as a single entry in the set, even though the root word and the case ending exist as separate entries. Then the second step kicks in. The term is decomposed into the root term and the case ending, after which a lookup for both components is performed. The lookup succeeds and then it proceeds to check the grammar and orthography rules. It passes the grammar rule that states that a case ending can be applied to a noun and also passes the orthography rules that state that a case ending should appear after the root word and that the case ending and the root word should be written as one single word. The spell checker then flags the term as a correct term.
At this point, if the spell checker cannot determine that the term is correct, it looks for closest matches in the set and comes up with a list of suggestions ranked based on popularity and how closely they match the term. The suggestions are then presented to the user. The suggestions may be entries from the set or other forms of entries, e.g., auto generated inflections.
Many other rules are considered. The rules get updated on a regular basis to reflect any changes to the Nepali Academy guidelines or to incorporate a wide range of use cases. The most common rules that catch common errors are the rules on conjugation, declension, derived words and grammatical categories, like aspect, number, gender, voice, etc.
The following list outlines examples of common rules.
- Number, case endings should appear as a suffix to a noun phrase.
- A word with a noun phrase in the beginning can have a postposition, a number and case endings as suffixes. If a case ending suffix has been applied, it must be the end of the word, e.g., Gharma (noun followed by case ending), but after a postposition or a number suffix, case ending suffix may still be applied, e.g., Gharharuma (noun followed by number followed by case ending.)
- Conjunctions and interjections must be written as standalone words.
- When multiple verbs combine to form one word, the ending vowel of all verbs except the last one must be Raswa.
Reasons you might not see suggestions
In order not to go overboard while analyzing a document, by default, Nepali Spell Checker ignores certain types of terms.
Non-Devanagari terms are generally ignored and are not flagged as incorrect. Similarly, characters that are not used to write Nepali are generally not checked for their correctness. For example, the English word "cat" or other non-Devanagari terms are ignored since they are not used to write Nepali.
When Purnabiram appears together with a word, i.e., when there is no space between the word and Purnabiram, Microsoft Word add-in ignores such words. This limitation is with the Microsoft API used by the add-in and may no longer be a limitation when newer versions of the API are released.
Based on a popular enhancement request from our users, we are pleased to introduce the "Autocorrect" feature. With this feature, you are now able to autocorrect common errors without having to accept the suggestions for such common misspelled words.
A list of the most common misspelled words and their correct forms powers this feature. While analyzing a term, Nepali Spell Checker looks at the list and if the term matches an entry in the list, it automatically replaces the term with the corresponding correct form of the matched entry. At times, there may be situations where a misspelled term could match multiple entries from the list. In such a case, the spell checker presents the user with all correct suggestions. The list is regularly updated to ensure that new common misspelled words and their correct forms are added on a continual basis.
The "Autocorrect" feature is not enabled by default. To activate this feature, select the "Autocorrect" checkbox before spell-checking your document.
Benefits of using the app
By using the app, you will be contributing to make the app better as the app is designed to learn from its usage. The more it is used, the better the algorithm gets. Its usage will also contribute to the generation of new knowledge and insights on the usage of the Nepali language and its trends, reports of which are published by NLRC. Such reports contain non-personally identifying information, such as top words over a given time, common misspelled words, etc. Such information will be beneficial to the community interested in learning and promoting the Nepali language.
Using the app for checking documents with a large amount of text
In order to make the service more equitable among our community, who uses the app on a regular basis, and given the infrastructure we have, currently with the basic version of Nepali Spell Checker, you can check documents only with a limited amount of text. Nepali Spell Checker Pro and Nepali Spell Checker Enterprise do not have this limitation. If you need to spell-check documents with a large amount of text in one go, please let us know and we'll try our best to accommodate your needs.