%ignore

> {-# LANGUAGE DeriveDataTypeable #-}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

> module Doc where
> import GHC.Generics
> import Data.Typeable
> import Data.Data
> import ListToTree

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%endignore

==== Document and related types

=== Basic special markup and types for structured documents

== Basic types

A document is a list of top-level elements (for example, paragraphs,
lists, etc.):

> type Document = [TopElem]

A Top-level element is a list of simple elements (for example, plain
text, bolded text, italic text, linked text or images, entries in an
unordered list, etc.)

> type Elems = [Elem]
> data TopElem = Par Elems
> --             | Pre Elems
>              | Div DivInfo [Elems]
> --             | Tbl [[Elems]]
>              | RawBeg Char Int Elems -- raw beginner-style paragraph, converted to a SpecPar
>              | SpecPar Special [Elems] -- special paragraph type
>              -- special possibly tree-shaped paragraph type (outlines). top level is a list with empy node
>              | TreePar Special (Tree Elems)
>   deriving (Show, Eq, Typeable, Data)

Where ""Tree a"" is imported from ""ListToTree"" module:

< data Tree a = Tree a [Tree a]
<   deriving (Show, Eq, Data, Typeable)

> data Special = Head1 | Head2 | Head3 | Head4 -- Heading types
>              | Code    -- code with > or < signs on the left (removed by the parser--only unlit cares which)
>              | Pre     -- Quoted literal text, with a " at the far left
>              | Comment -- MML internal comment and/or directives
>              | BList   -- bulleted list
>              | NList   -- numbered list
>              | SList   -- star list -- in this case, like poetry with bold
>              | Poetry  -- outline/list with no bullets
>              | BQuote  -- block quote, indented paragraph
>   deriving (Show, Eq, Typeable, Data)

Divs let you create subdocuments, for example to place text in
floating figures or multiple columns, or special centered text, etc.

> data DivInfo = Custom String
>   deriving (Show, Eq, Typeable, Data)

An atomic element is either simple text, an entity (explained below),
a link, marked-up text (italic or bold, which may contain any number
of atomic elements) or an image element.

> type Entity = String -- "ntilde", "ndash", "nbsp" for example

> data Elem = Txt String
>           | Ent Entity
>           | TT Elems
>           | Ital Elems
>           | Bold Elems
>           | VarInt Elems
>           | UrlElem Url
>           | RawLnk Elems -- to be processed into a Lnk or Img
>           | Lnk Url Elems
>           | Img Url (Maybe Url) (Maybe AltTxt)
>           | Fnote Integer Elems -- footnote
>   deriving (Show, Eq, Typeable, Data)

== Urls, Links, Images

A Url could be just a String but it could also be something more
involved, for example:

> data Url = Url ProtocolStr UrlStr
>          -- | Intern Integer String
>            deriving (Show, Eq, Typeable, Data)

Alternate Text, Protocol Strings, Url Strings are just strings, but
type synonyms are a good thing.

> type AltTxt      = String
> type ProtocolStr = String
> type UrlStr      = String

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Clean up a paragraph by stripping leading and trailing whitespace off
of Txt elems, but only in certain situations (e.g. ""[[ http://abc  ]]"")

> stripWs, stripStart, stripEnd :: Elems -> Elems
> stripWs = filter (/= Txt "") . stripStart
> stripStart [] = []
> stripStart (Txt t1:es) = (Txt (stripLeft t1)) : stripEnd es
> stripStart (e:es)      = e : (stripEnd es)
> stripEnd [] = []
> stripEnd (Txt t:[]) = Txt (stripRight t) : []
> stripEnd (e:[]) = e : []
> stripEnd (e:es) = e : stripEnd es

> strip, stripRight, stripLeft :: String -> String

> stripRight s = reverse (stripLeft (reverse s)) -- I hope lazy evaluation makes this faster than
>                                                -- intuition makes it seem... though it is probably
>                                                -- 2n instead of n, for n == length s
> stripLeft  = dropWhile (`elem` " \t")
> strip      = stripRight . stripLeft -- order matters here!