%ignore > {-# OPTIONS_GHC -fglasgow-exts #-} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% > module Doc where > import Data.Generics > import ListToTree %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %endignore ==== Document and related types === Basic special markup and types for structured documents == Basic types A document is a list of top-level elements (for example, paragraphs, lists, etc.): > type Document = [TopElem] A Top-level element is a list of simple elements (for example, plain text, bolded text, italic text, linked text or images, entries in an unordered list, etc.) > type Elems = [Elem] > data TopElem = Par Elems > -- | Pre Elems > | Div DivInfo [Elems] > -- | Tbl [[Elems]] > | RawBeg Char Int Elems -- raw beginner-style paragraph, converted to a SpecPar > | SpecPar Special [Elems] -- special paragraph type > -- special possibly tree-shaped paragraph type (outlines). top level is a list with empy node > | TreePar Special (Tree Elems) > deriving (Show, Eq, Data, Typeable) Where ""Tree a"" is imported from ""ListToTree"" module: < data Tree a = Tree a [Tree a] < deriving (Show, Eq, Data, Typeable) > data Special = Head1 | Head2 | Head3 | Head4 -- Heading types > | Code -- code with > or < signs on the left (removed by the parser--only unlit cares which) > | Pre -- Quoted literal text, with a " at the far left > | Comment -- MML internal comment and/or directives > | BList -- bulleted list > | NList -- numbered list > | SList -- star list -- in this case, like poetry with bold > | Poetry -- outline/list with no bullets > | BQuote -- block quote, indented paragraph > deriving (Show, Eq, Data, Typeable) Divs let you create subdocuments, for example to place text in floating figures or multiple columns, or special centered text, etc. > data DivInfo = Custom String > deriving (Show, Eq, Data, Typeable) An atomic element is either simple text, an entity (explained below), a link, marked-up text (italic or bold, which may contain any number of atomic elements) or an image element. > type Entity = String -- "ntilde", "ndash", "nbsp" for example > data Elem = Txt String > | Ent Entity > | TT Elems > | Ital Elems > | Bold Elems > | VarInt Elems > | UrlElem Url > | RawLnk Elems -- to be processed into a Lnk or Img > | Lnk Url Elems > | Img Url (Maybe Url) (Maybe AltTxt) > | Fnote Integer Elems -- footnote > deriving (Show, Eq, Data, Typeable) == Urls, Links, Images A Url could be just a String but it could also be something more involved, for example: > data Url = Url ProtocolStr UrlStr > -- | Intern Integer String > deriving (Show, Eq, Data, Typeable) Alternate Text, Protocol Strings, Url Strings are just strings, but type synonyms are a good thing. > type AltTxt = String > type ProtocolStr = String > type UrlStr = String %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Clean up a paragraph by stripping leading and trailing whitespace off of Txt elems, but only in certain situations (e.g. ""[[ http://abc ]]"") > stripWs, stripStart, stripEnd :: Elems -> Elems > stripWs = filter (/= Txt "") . stripStart > stripStart [] = [] > stripStart (Txt t1:es) = (Txt (stripLeft t1)) : stripEnd es > stripStart (e:es) = e : (stripEnd es) > stripEnd [] = [] > stripEnd (Txt t:[]) = Txt (stripRight t) : [] > stripEnd (e:[]) = e : [] > stripEnd (e:es) = e : stripEnd es > strip, stripRight, stripLeft :: String -> String > stripRight s = reverse (stripLeft (reverse s)) -- I hope lazy evaluation makes this faster than > -- intuition makes it seem... though it is probably > -- 2n instead of n, for n == length s > stripLeft = dropWhile (`elem` " \t") > strip = stripRight . stripLeft -- order matters here!