Cyril Nikolaev

Hyph 0.99 »

Read me ›

Introduction

The goal of the project is to provide a web-master with a way to use intelligent automatic hyphenation on their web-pages.

Formerly you should use ­ marks in your HTML document to let HTML 4.0 compatible browsers break the word. Unfortunately, the browsers even nowadays do not use properly the ­ guide. Firefox doesn't support it at all. Opera and IE (including version 7) insert visible hyphens in place of your soft hyphens when copying text to clipboard. Moreover, text with ­'s is not search engine friendly.

W3C committee did notice the problem and included the definition of the hyphenate property in the CSS3 recommendation draft. And anyway, implementation of hyphenation algorithm is left for browser vendors.

The Hyph script enables you to use hyphenation just now. Unfortunately, the quality of automatic word division is not perfect. It doesn't make use of a dictionary, but only some heuristic algorithm, that sometimes (though reasonably seldom) gives wrong results.

Why use it

If you want to make your texts look more professional, you should consider using Hyph. Text hyphenation ideally suits blogs, articles, narrow-column layout. It makes texts easier to read and more consistently looking. All that you have to do to have that beauty is to add two or three lines into your HTML code. Viewing a hyphenated page in a non-Javascript browser is not a problem — it will properly display the text, though without desired hyphenation. For instructions see section Usage.

If you use the Hyph on your site, please, let me know. Send your site's URL and your comment with the form at snusmumrik.org.ru/hyph/feedback. Also, please, send feedback on your experience with the program (bugs, lacking features, success stories ;) by e-mail.

Features

  • hyphenation of English and Russian texts
  • uses CSS selectors to select hyphenated nodes
  • easily extendable word-division algorithm
  • based on CSS2 and Javascript
  • search-engine compatible
  • old browsers compatible

Downloads and updates

Downloads and updates available on http://snusmumrik.org.ru/hyph.

Contact me at cyril7@gmail.com.

Usage

To enable hyphenation copy hyphens.js, hyphens.css and hyphen.gif to your webserver. Then add the following lines into the head element of your HTML file:

<link rel="stylesheet" type="text/css"
    href="path/to/hyphens.css" />
<script type="text/javascript"
    src="path/to/hyphens.js"></script>
<script type="text/javascript">
    Hyphenator.hyphenate('#doc p') </script>

Do not forget to add proper pathes to the files and change the path to hyphen.gif in hyphens.css. Then you should modify the css selector (#doc p in the example) to select the elements you want to hyphenate.

By default, Hyph doesn't apply hyphenation to the following elements: code, samp, kbd, var, abbr, acronym, sub, sup, pre, button, option, label. You may change this behavior by altering the Hyphenator.skipTags array.

I should notice that the hyphen used in IE and Opera is an 5×1 px image (hyphen.gif). So if you use a large or a very small font for your text you should modify the hyphen.gif. And then hyphens.css to set needed margins covering your own hyphen. You won't have such problems with the Firefox, cause it uses a minus character as a hyphen. But if you want to use for example en- or em-dash — modify the .hhf:after rule in hyphens.css file.

The second issue is that you should specify the text background color in hyphens.css. Search for the following rule in the file and change the color value.

.hso, .hsi, .hsf {
	background-color: #fff;
}

If you use different fonts on different backgrounds you may create complex CSS rules selecting the needed elements and changing the margins and hyphen images.

Examples

You can find a sample (test.html) in the Hyph distribution. This file uses Hyph as well and you are supposed to see the hyphens if you use a modern Javascript-enabled browser. An example of Hyph online use is my site — snusmumrik.org.ru. You can find more online examples at snusmumrik.org.ru/hyph/examples.

Technical information

The technology consists of two main blocks — syllable division (the Javascript) and making browser break the words (the CSS). In fact, you can make word division on the server and lose the need of Javascript. The problem is that the size of HTML file increases dramatically. So word-processing is moved to client-side. The purpose of Javascript is to turn the text into a sequence of span elements like the following:

For Opera:
The <span class="hpo">tech</span><span class="hbo"></span>
<span class="hso hpo">no</span><span class="hbo"></span>
<span class="hso hpo">lo</span><span class="hbo"></span>
<span class="hso">gy</span> ...
For Firefox:
The tech<span class="hhf"></span><span class="hsf">no</span>
<span class="hhf"></span><span class="hsf">lo</span>
<span class="hhf"></span><span class="hsf">gy</span> ...
For IE:
The <span class="hpi">tech</span><span class="hii"></span>
<span class="hsi hpi">no</span><span class="hii"></span>
<span class="hsi hpi">lo</span><span class="hii"></span>
<span class="hsi">gy</span> ...

In the class names first h stands for nothing, second h stands for hyphen, p — for prefix, i — for infix, in the third position i stands for IE, o — for Opera, f — for firefox. The CSS rules generate in some way a hidden space by which the words break but that isn't mentioned in the DOM and clipboard. The CSS also adds hyphens either with an image or with a character, and then a negative margin to hide the hyphen with a subsequent text with a white background.

The Javascript defines a Hyphenator class and a document.getElementsBySelector method. Suppose, the purpose of the latter method is clear. The Hyphenator class contains the init method that is invoked just after the class definition, methods that start hyphenation process (hyphenate, hyphenateNode, hyphenateNodeList) and some internal methods. If the latter ones are called before the document is fully loaded the request is moved to a queue. The enqueued requests are fulfilled on the onload event.

startHyphenate — entry point of internal hyphenation process on a specific single node.

hyphenateRecursive — iteration of node children and throwing text nodes to hyphenateInt.

divideText, divideTextInt — word-division functions. You may want to redefine those methods to improve the algorithm. You might need to detect language in divideText and send it into different implementations if divideTextInt.

addSlab — not well-coded method;) that adds styled spans containing a syllable into the result document. You may want to redefine it if you use an other CSS technique to break words.

To do

  • solve problems with russian letter Ь
  • the letter-spacing problem
  • improve the accuracy of the English word-breaking algorithm
  • use CSS3 properties (when browsers support them)

Version history

Version 0.99
  • Safari and Konqueror support
Version 0.97, 0.98
  • Opera shows hyphens in a stable way
  • Fixed accidental unexpected hyphen displays in IE
Version 0.96
  • Skipped elements list
Version 0.95
  • Major changes in word division algorithms
  • Introduced CSS-queries to select nodes to hyphenate
  • Minor bug-fixes
Version 0.9
  • First release
по-русски