com.openexchange.mail.text.parser.handler
Class HTML2TextHandler

java.lang.Object
  extended by com.openexchange.mail.text.parser.handler.HTML2TextHandler
All Implemented Interfaces:
HTMLHandler

public final class HTML2TextHandler
extends java.lang.Object
implements HTMLHandler

HTML2TextHandler - A handler to generate plain text version from parsed HTML content which is then accessible via getText().

Author:
Thorben Betten

Constructor Summary
HTML2TextHandler(int capacity, boolean appendHref)
          Initializes a new HTML2TextHandler.
 
Method Summary
 java.lang.String getText()
          Gets the extracted text.
 void handleCDATA(java.lang.String text)
          Handles specified CDATA segment's text; e.g.
 void handleComment(java.lang.String comment)
          Handles specified comment.
 void handleDocDeclaration(java.lang.String docDecl)
          Handles the DOCTYPE declaration.
 void handleEndTag(java.lang.String tag)
          Handles specified end tag.
 void handleError(java.lang.String errorMsg)
          Handles specified error.
 void handleSimpleTag(java.lang.String tag, java.util.Map<java.lang.String,java.lang.String> attributes)
          Handles specified simple tag.
 void handleStartTag(java.lang.String tag, java.util.Map<java.lang.String,java.lang.String> attributes)
          Handles specified start tag.
 void handleText(java.lang.String text, boolean ignorable)
          Handles specified text.
 void handleXMLDeclaration(java.lang.String version, java.lang.Boolean standalone, java.lang.String encoding)
          Handles the <?xml...
 HTML2TextHandler reset()
          Resets this handler for re-usage
 void setContextId(int contextId)
          Sets the context ID for debugging purpose on handleError(String).
 void setMailFolderPath(java.lang.String mailFolderPath)
          Sets the mail folder path for debugging purpose on handleError(String).
 void setMailId(long mailId)
          Sets the mail ID for debugging purpose on handleError(String).
 void setUserId(int userId)
          Sets the user ID for debugging purpose on handleError(String).
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTML2TextHandler

public HTML2TextHandler(int capacity,
                        boolean appendHref)
Initializes a new HTML2TextHandler.

Parameters:
capacity - The initial capacity
appendHref - true to append URLs contained in hrefs and srcs; otherwise false.
Example: <a href=\"www.somewhere.com\">Link<a> would be Link [www.somewhere.com]
Method Detail

getText

public java.lang.String getText()
Gets the extracted text.

Returns:
The extracted text

setMailFolderPath

public void setMailFolderPath(java.lang.String mailFolderPath)
Sets the mail folder path for debugging purpose on handleError(String).

Parameters:
mailFolderPath - The mail folder path to set

setMailId

public void setMailId(long mailId)
Sets the mail ID for debugging purpose on handleError(String).

Parameters:
mailId - The mail ID to set

setUserId

public void setUserId(int userId)
Sets the user ID for debugging purpose on handleError(String).

Parameters:
userId - The user ID to set

setContextId

public void setContextId(int contextId)
Sets the context ID for debugging purpose on handleError(String).

Parameters:
contextId - The context ID to set

handleComment

public void handleComment(java.lang.String comment)
Description copied from interface: HTMLHandler
Handles specified comment. Specified value is without leading "<!--" and without trailing "-->".

Specified by:
handleComment in interface HTMLHandler
Parameters:
comment - The comment

handleDocDeclaration

public void handleDocDeclaration(java.lang.String docDecl)
Description copied from interface: HTMLHandler
Handles the DOCTYPE declaration. Specified value is without leading "<!DOCTYPE" and without trailing ">"; e.g.
 '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">'
 
yields
  ' html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"'
 

Specified by:
handleDocDeclaration in interface HTMLHandler

handleEndTag

public void handleEndTag(java.lang.String tag)
Description copied from interface: HTMLHandler
Handles specified end tag.

Specified by:
handleEndTag in interface HTMLHandler
Parameters:
tag - The tag's name

handleError

public void handleError(java.lang.String errorMsg)
Description copied from interface: HTMLHandler
Handles specified error.

Specified by:
handleError in interface HTMLHandler
Parameters:
errorMsg - The error message

handleSimpleTag

public void handleSimpleTag(java.lang.String tag,
                            java.util.Map<java.lang.String,java.lang.String> attributes)
Description copied from interface: HTMLHandler
Handles specified simple tag.

Specified by:
handleSimpleTag in interface HTMLHandler
Parameters:
tag - The tag's name
attributes - The tag's attributes as an unmodifiable map

handleStartTag

public void handleStartTag(java.lang.String tag,
                           java.util.Map<java.lang.String,java.lang.String> attributes)
Description copied from interface: HTMLHandler
Handles specified start tag.

Specified by:
handleStartTag in interface HTMLHandler
Parameters:
tag - The tag's name
attributes - The tag's attributes as an unmodifiable map

handleCDATA

public void handleCDATA(java.lang.String text)
Description copied from interface: HTMLHandler
Handles specified CDATA segment's text; e.g. 'fo<o' from '<![CDATA[fo<o]]>'.

Specified by:
handleCDATA in interface HTMLHandler
Parameters:
text - The CDATA segment's text

handleText

public void handleText(java.lang.String text,
                       boolean ignorable)
Description copied from interface: HTMLHandler
Handles specified text.

Note: Specified text contains all control characters from corresponding HTML content; e.g.:

 Sorry if my article tried to imply that this is a
           new thing (I hope it hasn't).
 
will be given as:
 Sorry if my article tried to imply that this is a
           new thing (I hope it hasn't).
 

Note: A text only containing whitespace characters is omitted.

Specified by:
handleText in interface HTMLHandler
Parameters:
text - The text
ignorable - true if specified text may be ignored since it only serves for formatting; otherwise false

reset

public HTML2TextHandler reset()
Resets this handler for re-usage

Returns:
This html2text handler

handleXMLDeclaration

public void handleXMLDeclaration(java.lang.String version,
                                 java.lang.Boolean standalone,
                                 java.lang.String encoding)
Description copied from interface: HTMLHandler
Handles the <?xml... ?> declaration.

Specified by:
handleXMLDeclaration in interface HTMLHandler
Parameters:
version - The version; either "1.0" or null
standalone - The standalone boolean value; either Boolean.TRUE, Boolean.FALSE, or null
encoding - The encoding; the charset name or null