About markup.py

Markup.py is an intuitive, light weight, easy-to-use, customizable and pythonic HTML/XML generator. It attempts to help developers output valid HTML 4.01 as defined by the W3C specification. The set of valid and invalid elements is not hard-wired and is easily customizable. There are no third-party dependencies only the standard Python library is needed in order to use markup.py.

Markup.py works with both python 2 and 3.

The current (1 November 2012) version is 1.9.

The code is in the public domain, you can do absolutely whatever you want with it.

Markup.py is a very simple and light weight tool for quickly and painlessly putting together HTML/XML documents practically without any learning curve. If you need anything in a production environment choose something else, e.g. ElementTree.

About the name: it is markup.py and not just markup so that it is easier to search for it on search engines. Note that there was once a Markup package from Edgewall which changed its name to Genshi and now can be found at http://genshi.edgewall.org.

A quick example: the following code

    import markup

    items = ( "Item one", "Item two", "Item three", "Item four" )
    paras = ( "This was a fantastic list.", "And now for something completely different." )
    images = ( "thumb1.jpg", "thumb2.jpg", "more.jpg", "more2.jpg" )

    page = markup.page( )
    page.init( title="My title", 
               css=( 'one.css', 'two.css' ), 
               header="Something at the top", 
               footer="The bitter end." )

    page.ul( class_='mylist' )
    page.li( items, class_='myitem' )
    page.ul.close( )

    page.p( paras )
    page.img( src=images, width=100, height=80, alt="Thumbnails" )

    print page
            

will result in

        <!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN'>
        <html lang='en'>
        <head>
        <link media='all' href='one.css' type='text/css' rel='stylesheet' />
        <link media='all' href='two.css' type='text/css' rel='stylesheet' />
        <title>My title</title>
        </head>
        <body>
        Something at the top
        <ul class='mylist'>
        <li class='myitem'>Item one</li>
        <li class='myitem'>Item two</li>
        <li class='myitem'>Item three</li>
        <li class='myitem'>Item four</li>
        </ul>
        <p>This was a fantastic list.</p>
        <p>And now for something completely different.</p>
        <img width='100' alt='Thumbnails' src='thumb1.jpg' height='80' />
        <img width='100' alt='Thumbnails' src='thumb2.jpg' height='80' />
        <img width='100' alt='Thumbnails' src='more.jpg' height='80' />
        <img width='100' alt='Thumbnails' src='more2.jpg' height='80' />
        The bitter end.
        </body>
        </html>
                

See below for more examples or download a module with even more examples.py.

The emphasis is on being lightweight; markup.py is not a web framework or full blown templating engine or parser, it is actually a single file with only three classes plus exceptions. And its only purpose in life is to generate HTML/XML without cluttering your Python code with tags.

There are three modes of operation. In strict HTML mode an exception is raised for deprecated elements, in loose HTML mode deprecated elements are allowed and in XML mode arbitrary elements are allowed by default or they may be customized.

Apart from the basic elements there are shorthand methods in any of the two HTML modes for frequently used combination of tags such as defining a CSS stylesheet, defining a header or footer, etc.

Basic operation of markup.py tries to be simple. Element definition in your Python code is done via

    mypage.element( text, attr1=value1, attr2=value2 )

which will be rendered as

    <element attr1='value1' attr2='value2'>text</element>

for an element with both opening and closing tags. Here mypage is an instance of markup.page. For an element with an opening tag only non-keyword arguments raise an exception thus only

    mypage.element( attr1=value1, attr2=value2 )

is allowed and will be rendered as

    <element attr1='value1' attr2='value2' />

The set of valid elements is customizable. You can define two sets one for the elements with opening tags only and one for elements with both opening and closing tags. If you write a document with a fixed set of element it is useful to define these because typos will be recognized and appropriate exceptions will be raised. There are predefined sets for HTML and an option to allow arbitrary elements.

In order to eliminate the need for explicit loops over the same elements all arguments can be lists or tuples or any iterator in fact. That is text, value1 and value2 in the above examples may be iterators and the corresponding element will be repeated as many times as the length of the iterator. If the lengths are not the same (which is allowed) the longest will determine the number of elements rendered and the last entry in the shorter iterators will be repeated as many times as need for it to be as long as the longest.

It is also possible to nest elements. For full documentation see below.

Related projects include makeHTML, HTMLgen and pyhtmloo.


Features


  • Conforms to HTML 4.01 by default

  • Optionally a customizable set of valid elements can be used for XML

  • Optionally escape <, > and & characters as &lt;, &gt; and &amp;

  • Output elements in either upper or lower case (or as they are entered)

  • Nesting of elements

  • Save lot of annoying tag typing! :)


Examples

HTML

The following code

    import markup

    title = "Useless Inc."
    header = "Some information at the top, perhaps a menu."
    footer = "This is the end."
    styles = ( 'layout.css', 'alt.css', 'images.css' )

    page = markup.page( )
    page.init( css=styles, title=title, header=header, footer=footer )
    page.br( )
    
    paragraphs = ( "This will be a paragraph.",
                   "So as this, only slightly longer, but not much.",
                   "Third absolutely boring paragraph." )

    page.p( paragraphs )
        
    page.a( "Click this.", class_='internal', href='index.html' )
    page.img( width=60, height=80, alt='Fantastic!', src='fantastic.jpg' )

    print page
            

will result in

    <!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN'>
    <html lang='en'>
    <head>
    <link href='layout.css' type='text/css' rel='stylesheet' />
    <link href='alt.css' type='text/css' rel='stylesheet' />
    <link href='images.css' type='text/css' rel='stylesheet' />
    <title>Useless Inc.</title>
    </head>
    <body>
    Some information at the top, perhaps a menu.
    <br />
    <p>This will be a paragraph.</p>
    <p>So as this, only slightly longer, but not much.</p>
    <p>Third absolutely boring paragraph.</p>
    <a href='index.html' class='internal'>Click this.</a>
    <img src='fantastic.jpg' alt='Fantastic!' height='80' width='60' />
    This is the end.
    </body>
    </html>
            

For an HTML snippet without doctype, head, title, etc, omit the init( ) method:

    import markup

    images = ( 'egg.jpg', 'spam.jpg', 'eggspam.jpg' )
    
    page = markup.page( case='upper' )

    page.div( class_='thumbs' )
    page.img( width=60, height=80, src=images, class_='thumb' )
    page.div.close( )

    print page
            

which will output

    <DIV class='thumbs'>
    <IMG src='egg.jpg' height='80' class='thumb' width='60' />
    <IMG src='spam.jpg' height='80' class='thumb' width='60' />
    <IMG src='eggspam.jpg' height='80' class='thumb' width='60' />
    </DIV>
            

XML

In the first example any element is allowed to occur in the document and a complete document will be generated. Note the mode='xml' keyword argument to init( ).

    import markup

    titles = ( 'Best features of M-theory', 'Best bugs in M-theory', 'Branes and brains' )
    universities = ( 'Cambridge', 'MIT', 'Amsterdam' )
    dates = ( 'January', 'February', 'March' )

    myxml = markup.page( mode='xml' )
    myxml.init( encoding='ISO-8859-2' )

    myxml.cv.open( )
    myxml.talk( titles, university=universities, date=dates )
    myxml.cv.close( )

    print myxml
            

the above will output

    <?xml version='1.0' encoding='ISO-8859-2' ?>
    <cv>
    <talk date='January' university='Cambridge'>Best features of M-theory</talk>
    <talk date='February' university='MIT'>Best bugs in M-theory</talk>
    <talk date='March' university='Amsterdam'>Branes and brains</talk>
    </cv>
            

Limiting the set of valid elements is done throught the onetags and twotags options of the init( ) method:

    import markup

    names =     ( 'Alice', 'Bob', 'Eve' )
    positions = ( 'encryption', 'encryption', 'eavesdropper' )
    locations = ( 'headquarters', 'headquarters', 'unknown' )

    myxml = markup.page( mode='xml', onetags=[ 'person', 'location' ], twotags=[ 'company' ] )
    myxml.init( )
    
    myxml.company( name='Secret' )
    myxml.person( name=names, position=positions, location=locations )
    myxml.location( name=( 'headquarters', 'unknown' ), address=( 'here', 'hmmmm' ) )
    myxml.company.close( )

    print myxml
            

The above will result in

    <?xml version='1.0' ?>
    <company name='Secret'>
    <person position='encryption' location='headquarters' name='Alice' />
    <person position='encryption' location='headquarters' name='Bob' />
    <person position='eavesdropper' location='unknown' name='Eve' />
    <location name='headquarters' address='here' />
    <location name='unknown' address='hmmmm' />
    </company>
            

In the above example passing a non-keyword argument to person would have raised an exception.


Documentation

Each element is represented by a callable subclass of markup.page. Elements requiring both an opening and closing tag and those with only opening tags are distinguished and handled differently. The first (non-keyword) argument to an element will appear between the opening and closing tags and any keyword argument is interpreted as an attribute. A non-keyword argument for elements requiring only an opening tag raises an exception.

Thus if mypage is an instance of markup.page then

    mypage.element( text, attr1=value1, attr2=value2 )

will be rendered as

    <element attr1='value1' attr2='value2'>text</element>

for an element with both opening and closing tags. For an element with an opening tag only non-keyword arguments raise an exception and only

    mypage.element( attr1=value1, attr2=value2 )

is allowed and will be rendered as

    <element attr1='value1' attr2='value2' />

It is possible to output an attribute without a value by specifying None as the keyword argument:

    mypage.element( checked=None )

which will result in

    <element checked />

Adding pure text without tags is done with the add( ) method of markup.page which unfortunately forbids the use of add as an element name. The method add( ) should be called with a string argument that will be added to the document without any tags. Similarly addheader( ) adds text to the top of the document but after any header information added by the init( ) method and addfooter( ) adds text to the bottom of the document but also after any footer added by init( ).

All arguments can also be lists, tuples or any iterator of strings in which case the element in question will be repeated as many times as the length of the longest iterator. Shorter iterators will be filled up with the last entry to become as long as the longest. This useful feature eliminates the need for explicit loops in the code and can be used for both attributes (keyword arguments) and the content of an element (non-keyword argument). An example showing this useful behaviour.

In addition, elements requiring both opening and closing tags have two methods, open( ) and close( ), those with only an opening tag have only one method, open( ). The open( ) method can have any number of keyword arguments which will turn into attributes. The close( ) method can have no arguments. Using these methods elements can be explicitly opened or closed. The validity of attributes is not checked.

It is possible to nest elements via the oneliner object. For instance:

    from markup import oneliner as e

    page = markup.page( )
    page.a( e.img( src='myimage.jpg' ), href='http://hak5.org/' )
    print page

would give

    <a href='http://hak5.org/'><img src='myimage.jpg' /></a>

The oneliner object is used in exactly the same way as page but returns the specified element without any whitespace. There is a corresponding object upper_oneliner that will output upper case elements.

Since in Python class is a special keyword it can not be given as an attribute to any element, although it is frequently used in HTML. To overcome this use class_ instead and it will turn into class in the actual output. This convention was chosen so as to be inline with PEP-8. There is also a problem with http-equiv and similar attributes because the - sign is interpreted as subtraction by Python. Use an underscore instead, http_equiv, etc.

Defining a new document starts with instantiating the markup.page class. The following optional arguments may be passed to the constructor:

mode
Possible values are strict_html or html for HTML 4.01, loose_html for HTML allowing some deprecated elements and xml for arbitrary elements by default or a customized set of elements upon request. See the options onetags and twotags below. The default value of mode is strict_html.

case
Possible values are upper, lower and given and the rendering of elements are done in upper, lower or in the case they are entered (for markup languages that are case sensitive) accordingly. The default value is lower.

onetags
The value of this option should be a list defining the valid set of elements without closing tags. Only interpreted in XML mode and if defined also should be the option twotags.

twotags
The value of this option is again a list defining the valid set of elements with both opening and closing tags. Only interpreted in XML mode and if defined also should be the option onetags.

separator
A string that will be printed between added elements. Defaults to a newline.

class_
A string that will be added to every element as a class attribute.

After instantiating the markup.page class it is possible to add stuff which is ususally needed for any document. This is not obligatory for example for an HTML/XML snippet which should not include the document type, head, title, etc., it might be omitted. For a full document it is useful however and is done with the init( ) method which accepts the following keyword argument in any mode:

doctype
If set its value will be the first line of the document and should represent the document type. Its default value is <!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN'> in HTML mode and is <?xml version='1.0' ?> in XML.

There are three doctypes defined in the class doctype:

doctype.frameset
<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Frameset//EN' 'http://www.w3.org/TR/html4/frameset.dtd'>

doctype.strict
<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01//EN' 'http://www.w3.org/TR/html4/strict.dtd'>

doctype.loose
<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01 Transitional//EN' 'http://www.w3.org/TR/html4/loose.dtd'>

There are further options that depend on whether HTML or XML mode is used and will be detailed below. If the init( ) method is used in HTML mode the closing tags </body></html> will be added to the bottom of the document automatically.

Once coding of the document finished, the actual content as a string is in mypage( ) or str( mypage ) assuming that the markup.page instance is mypage. Sometimes it is useful to escape all <, > and & characters as &lt;, &gt; and &amp; if you want to show them in a browser in which case mypage( escape=True ) can be called.

Generating HTML

In strict_html mode the allowed set of elements with opening tags only are AREA, BASE, BR, COL, FRAME, HR, IMG, INPUT, LINK, META, PARAM.

The following elements have both an opening and a closing tag: A, ABBR, ACRONYM, ADDRESS, B, BDO, BIG, BLOCKQUOTE, BODY, BUTTON, CAPTION, CITE, CODE, COLGROUP, DD, DEL, DFN, DIV, DL, DT, EM, FIELDSET, FORM, FRAMESET, H1, H2, H3, H4, H5, H6, HEAD, HTML, I, IFRAME, INS, KBD, LABEL, LEGEND, LI, MAP, NOFRAMES, NOSCRIPT, OBJECT, OL, OPTGROUP, OPTION, P, PRE, Q, SAMP, SCRIPT, SELECT, SMALL, SPAN, STRONG, STYLE, SUB, SUP, TABLE, TBODY, TD, TEXTAREA, TFOOT, TH, THEAD, TITLE, TR, TT, UL, VAR.

All other elements will raise an InvalidElementError exception.

In loose_html mode in addition to the above ones some deprecated elements are also allowed. The ones with opening tags only are BASEFONT, ISINDEX.

Deprecated elements with both opening and closing tags are APPLET, CENTER, DIR, FONT, MENU, S, STRIKE, U.

See the W3C specification for more details.

The init( ) method has the following keyword arguments in addition to doctype discussed above:

charset
If set to somecharset it is passed to the <meta http-equiv='Content-Type' content="text/html; charset='somecharset'"> meta element and will be printed in the head section.

lang
Sets the language of the document in the <html> tag as <html lang='en'>. Defaults to en.

css
Its value can be a list of filenames or a single filename and the corresponding file(s) are added as CSS stylesheets via the link element in the head section.

metainfo
Its value is a dictionary of the form { 'name':'content' } which will be inserted into meta element(s) as <meta name='name' content='content'> (ignored in xml mode).

bodyattrs
Its value is a dictionary { 'key':'value', ... } which will be added as attributes of the <body> element as <body key='value' ... > (ignored in xml mode).

script
A dictionary of { 'src':'type', ... } which will be added as
<script type='text/type' src='src'></script>
Can also be a list of [ 'src1', 'src2', .... ] and then type will be 'javascript' for all. (ignored in xml mode)

title
Used to set the document title via the title element

base
Used to set the <base href='...'/> tag in <head>

header
Its value can be a string and is placed right after the <body> tag.

footer
Its value can be a string and is place right before the </body> tag.

Generating XML

Apart from the doctype keyword argument of the init( ) method which is interpreted in both HTML and XML mode there is also a second one in XML mode:

encoding
If set to someencoding it is passed to the <?xml version='1.0' encoding='someencoding' ?> definition that will be the first line of the document. If not set the encoding attribute is omitted.

In XML mode any element with either only opening or both opening and closing tags are allowed. Using the onetags and twotags options when instantiating the markup.page class the default behaviour can be overriden and these lists may contain the only allowed elements with both opening and closing tags (twotags) and the ones with opening tag only (onetags). See above for an example. If these keyword arguments are used appropriate exceptions are raised if an invalid element (an element not in these lists) is attempted to be used in order to prevent typos.

Download

You can download markup.py from the sourceforge project page.


Development

The current (1 November 2012) version is 1.9.

If you find any bugs, please send them to nogradi at gmail dot com.

If you have feature requests, suggestions for improvements, any comments or such, please email that as well.

Here is a list of people who have contributed to the development of markup.py in some way or another, their efforts are kindly acknowledged:


  • Roel Mathys

  • Brian Blais

  • Davide Cesari

  • Carsten Bock

  • Fred Gansevles

  • Thorsten Kampe

  • Jason Moiron

  • Jerry Davis



nogradi at gmail dot com