Thursday, July 6, 2017

LaTeX Devanagari- IV: The final solution

Hi all! I am back after a long time with a new post! And we have a great announcement for the followers of this series of articles; we are announcing our book the describes how to use LaTeX for writing Indian languages; see the first section below.

This post,  as per my opinion, discusses the best way of using (Xe)LaTeX for writing Indian Languages. I think that his is going to be the final article in `LaTeX Devanagari' series. Any further articles on this topic would be discussing issues in Polyglossia and not how to write Indian scripts in LaTeX. 
Before I proceed, let me praise the package: this is a wonderful method! Recall  that, our quest started with writing "Devanagari" in LaTeX, and as I mentioned earlier, Polyglossia allows to write "Languages", that is, it knows the hyphenation patterns, names of various titles in the chosen language, and so on. Unlike Babel, Polyglossia offers typesetting for many Indian languages such as Sanskrit, Tamil, Marathi, Telugu, Bengali, Malayalam and, of course, Hindi. 
 The best part of using this methods is that the input is written in the standard script of the chosen language itself, a transliteration is not required. Thus, to write Sanskrit, Marathi or Hindi, the input is given in Devanagari. To write Tamil, the input is written in Tamil.
Furthermore, the writer may choose any "Unicode" font on her machine (see our article for the discussion of fonts; click here).

I owe Mr. Sushant Devalekar (श्री. सुशान्त देवळेकर) special thanks for fruitful discussions and bringing many issues to my notice.

Do see the links given at the end of the post.

Previous articles- 1, 2, 3

  The announcement

Firstly, this is an announcement that I have written a guide for using package Polyglossia for Indian languages; the guide is published on "Comprehensive TeX Arxives Network" --- the maintainers of LaTeX packages; the guide is typeset using Polyglossia; the guide is copyrighted under "LaTeX Project Public License 1.3 and onwards" and available free for academic purposes; you may download it from CTAN : click here; and the guide is written in Marathi! Thus, I think that this is the first LaTeX book(let) written in an India language.

We are not going discuss any syntax in this article but some theoretical aspects of editing programmes, especially (Xe)LaTeX. Since the book describes how to use polyglossia, there is no need to repeat it here. Curious readers may simply download the book and start getting your hands on Polyglossia! Click here or here to get the book.  The minimal preamble for Indian language is discussed on pages 7---9 of the book.

If you are going to use "MARATHI" in your document DO LOOK at "महत्वाचे" on PAGE 8 in the booklet before running your document. Otherwise, you may encounter an error "Kernel choince unknown" or some fontspec error.

Polyglossia

Package Babel is a multilingual support for (plain) TeX and LaTeX. When Babel is used, once you choose a language to use in you document, Babel typesets the (La)TeX document according to the culturally determined typographical rules and hyphenation styles of the language. Babel offers support for a number of languages including Hindi and Bengali (Bengali for Bangladesh). Babel uses "velthuis" or "devnag" package--- "velthuis" is the package described in our first article to process Hindi tex documents. However, Babel does not support other regional Indian languages, not even Sanskrit.

We never described XeLaTeX in detail earlier. Let me add few a lines of explanation. XeTeX is a typesetting engine based on a merger of TeX, Unicode and modern font technologies. XeTeX aimed to manipulate and produce beautiful fonts using TeX. The LaTeX counterpart of XeTeX is called XeLaTeX. Xe(La)TeX was initially developed for Mac OS X. But it is now available for all major systems.

 Similarly, LuTeX is a typesetting engine that uses a light weight scripting engine called Lua. LuaLaTeX is its LaTeX counterpart. Now it is easy to explain what Polyglossia is!

Polyglossia is a complete Babel replacement in LuaLaTeX and Xe(La)TeX.

Thus, in LuaLaTeX and XeLaTeX, Polyglossia offers traditional typesetting and hyphenations for many languages. Due to XeLaTeX's capability of working with fonts, Polyglossia offers a wide spectrum of languages to work with. It offers support for many major Indian languages some of which are Sanskrit, Hindi, Marathi, Tamil, Telugu, Malayalam, Bengali, Kannada and Urdu.

Offers supports..? What does that mean?

LaTeX does not only produce a dvi or PDF output--- it also typesets. What does "typesetting" mean? Every language has a certain convention about how it is written.

For example, given a ruled paper, for English or French language, the alphabets are written as if they are standing on the lines, whereas, for Sanskrit or Marathi, the alphabets are written as if they are hanging on the lines. Furthermore, the gap between a fullstop and the following letter is more than that between a comma and the following letter in English. Whereas, in French, both the gaps are constant. Most of the languages read from left to right and Arabic reads from right to left. Or, sometimes Chinese is written in vertical columns starting from right top of the rightmost column, going down the column, then to the top of the second right column, and so on. It finishes at the bottom of the leftmost column.

Speaking about Marathi, and even Sanskrit, now-a-days we write one bar on each word, for example, एका आणि एकाच शब्दावर एक सबंध रेषा. But in many old text, such as the one below, many different words are put on the same line. Thus, the typesetting of these texts is different than typesetting the present one.
The same line runs on the head of all words

Languages also have their own rules for hyphenations. All these together constitute typesetting. I think that the word typesetting has its origin in the classical way of printing where one has to arrange types on blocks; then apply ink on them and finally press the block on paper to get a print (see this article- click here). Typesetting, roughly, means arranging various characters according the the standards for writing the language--- this includes hyphenation and managing distances between various characters.

Thus, writing "Devanagari" in LaTeX is a very crude phrase. Writing Marathi, Sanskrit, Malayalam in LaTeX is pays more respect to the abilities of LaTeX!  This raises the need of Babel and Polyglossia. In the techniques discussed in previous articles, the typesetting rules are borrowed from Hindi. The first technical benefit of Polyglossia is that it offers the native typesetting for other languages which use Devanagari script! Indeed, the typesetting is limited by the availability of these rules in the digital form. What I mean by this is that if some rule regarding typesetting Sanskrit is not available in digital form, then polyglossia cannot avail it. For example, hyphenation for Marathi is not available in digital format†. Therefore, you can see that polyglossia is unable to hyphenate Marathi text properly.

Benefits of Polyglossia

Apart from this technical benefit, let's talk about the main requirement of you all which has driven you to this article, namely, producing Devanagari output in LaTeX. If you compare the output in the last two methods (Articles 1--3) with the one produced by Polyglossia then you shall see that Polyglossia offers a smoother, sharper and more beautiful output. Indeed, the output depends on the font. Selecting a font in LaTeX (Article 1) is a hards task. But Polyglossia allows you the write to choose "any Unicode" font available on your system very easily!

The best part of using Polyglossia is that it allows you write the main input contents in the same script as the selected language! Other two previous method need do not allow to do this, There you have to transliterate the text.  I think that for a present day user transliteration is annoying for three reasons. The first is that the Velthuis map is very natural, but it differs at some point from the famous Google transliteration which is used everywhere (though Google transliteration is not a scientific one). Thus, when I write an article using package velthuis, my collaborators had to first learn the transliteration and then write the files! Secondly, it is hard to locate the mistakes of long and short vowels a, i and u  (ि and  ी , ु and   ू) and few other consonants. It is often that you see too many mistakes after compiling the file! Furthermore, if one of your collaborator chooses to write "kA" for "का" and the other chooses to write "kaa" for "का", then reading the document and locating the mistakes for a few page long document becomes extremely clumsy job!

The final and the most important reason is that transliteration is not really the way to write a language!! (It is NOT a way when it is the horrible day-to-day transliteration used in Google input methods which is not scientific, and I would say, is killing Indian languages!)

Polyglossia overcomes these issues by allowing to write the script of the language directly in the input.

We strongly recommend to use InScript keyboard to type Indic scripts (click here); it is available in all good operating systems, may be in different names.
We strongly advice not to use Google transliterate to type Indic scripts.

As seen in the first article, creating output from a .dn file is not a straightforward task. But Polyglossia, like the XeLaTeX technique (Article II) allows a direct compilation. One writes the document and then selects the "XeLaTeX" engine in editor and simply processes the document to get the output!

XeLaTeX technique in Article 2 has issues with the math mode. However, writing math in roman in math mode works perfectly fine in Polyglossia! Polyglossia switches to English inside math mode. If the author wants, then this switching-to-English-in-Math-mode can be overcome. My book does not discuss math, since "LaTeX and Math" is a topic for a book in its own right. However, I have tried using many complicated involved math packages along with polyglossia, and they work well.

Now I would like to announce that I am closing the issue of writing "Devanagari" in LaTeX. Further articles, if any, shall be discussing finer stuff related to Polyglossia.

Link to Polyglossia download is given bellow; furthermore, I am taking the freedom to publicise efforts of Mr. Sushant Deolekar for improving Marathi on computers by adding links to may of his works bellow. I request you to use the nice fonts he has collected, visit and spread his channel for Unicode in Marathi, and even other efforts.

Footnote

†While searching for hyphenations for Marathi, I came across claims such as they are available, but I could not find who has done this is work, where is this work or an editor that uses these rules. A claim regarding Hindi was that Libre Office implemented hyphenation for Hindi, but, again,  I could not verify it! Now I am curious if hyphenation rules were made for Marathi and is there a documentation that explains them. Experts can put light on this; their comments shall save my time.

Links

  1. Sushant Deolekar's
  2. Polyglossia on CTAN- click here.
  3. Polyglossia manual- click here.
  4.  My book on CTAN "A practical guide to and Polyglossia for Marathi and other Indian languages"- click here or  here.
  5. Report an issue/ bug related to Polyglossia on GitHub- click here.
  6. The issue about Marathi translation on GitHub- click here.
 
                                                           ∆  ∆  ∆