==== MML
==== A Wiki-like Minimal Markup Language

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 

=== Parser module

%if False

> {-# OPTIONS_GHC -fglasgow-exts #-}

> module Mmlparse where
> import Text.ParserCombinators.Parsec hiding ( Parser )
> import Text.ParserCombinators.Parsec.Prim hiding ( Parser )
> import List
> import Monad
> import Data.Generics
>
> import Doc
> import Doc2Html
> import HTMLEnt
> import GroupSquash ( groupSquash )
>
> import ListToTree

%endif

> type Parser a = GenParser Char PState a

> type PState = Integer
> emptyState :: PState
> emptyState = 0

Simple helper function to flatten a list of lists
and remove duplicates, e.g. ""["//", "**", "[[", "]]"] --> "/*[]" ""

> flattenUnion :: (Eq a) => [[a]] -> [a]
> flattenUnion = nub . concat

%if False

5-tuple pattern matcher helper functions (so we can pass them around
without having to define lambdas everywhere)

> t1of5 :: (a,b,c,d,e) -> a
> t2of5 :: (a,b,c,d,e) -> b
> t3of5 :: (a,b,c,d,e) -> c
> t4of5 :: (a,b,c,d,e) -> d
> t5of5 :: (a,b,c,d,e) -> e
> t1of5 (a,_,_,_,_) = a
> t2of5 (_,b,_,_,_) = b
> t3of5 (_,_,c,_,_) = c
> t4of5 (_,_,_,d,_) = d
> t5of5 (_,_,_,_,e) = e

%endif

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

=== Basic parsers

== Normal text

Alphanumeric plus certain puncuation are consider normal, non-special
characters in both TeX and HTML, but everything else does something
weird so we have to deal with it separately

> lowercase   = ['a'..'z']
> uppercase   = ['A'..'Z']
> alpha       = lowercase ++ uppercase
> digits      = ['0'..'9']
> spacing     = " \t"
> normalPunct = "!?.,:;()" -- for now...
> normalChars = alpha ++ digits ++ normalPunct -- ++ spacing

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

== Urls, Links, Images

What does the parser recognize as a url and not just as normal text?
The phantom protocol `lnk' is useful for relative links and links to
images, etc.

> normalProtocols  = words "https:// http:// ftp:// mailto: file:/// urn: ldap:// news: telnet://"
> phantomProtocols = words "lnk:" -- later deleted away
> linktoProtocols  = phantomProtocols ++ normalProtocols

== ""img:""

Specifies that a url is to be treated as an image in the html
`img src' sense. This protocol is a phantom protocol as well,
meaning that it will get deleted later. You could just as well
have a compound url like this: ""img:http://www.google.com/images/logo_sm.gif""

> imageProtocols   = words "img:"

These are all the currently supported protocols, but it
is trivial to add new ones.

> protocols = imageProtocols ++ linktoProtocols
> protocol :: Parser Elem
> protocol = do p <- choice $ map (try . string) protocols
>               url <- many1 $ oneOf validUrlChars
>               return $ UrlElem $ Url p url

What characters are valid in a url? Almost anything on the keyboard
but spaces and ""{}^"|<>[]"" (""[]()"" may normally be valid but
that is bad since we want to use those to group links, plus delimiters are not
``naturally'' part of a URL). From p.12 of
  http://www.ietf.org/rfc/rfc3986.txt :

> validUrlChars = "%:,/?#@!$&'*+=-._~" ++ alpha ++ digits

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

== Paragraphs and line endings

Newline characters (or other line enders, if necessary)

> enders :: [Char]
> enders = "\n"

> anEnder :: Parser Char
> anEnder = (oneOf enders <?> "a line ending character")

A solitary line ending character keeps everything in the same
paragraph; the newline character is returned. This line ender
should not be followed by any ``special beginner'' characters
(bullets in a list, etc.)

> lineEnd :: Parser Char
> lineEnd = try (do c <- anEnder
>                   notFollowedBy $ choice [anEnder, oneOf (filter (`notElem` spacing) beginners)]
>                   return c)
>           <?> "one line ending character, no special beginner chars following"

One or more line ending characters mark a new paragraph. This assumes
that the parsers that eat the internals of the paragraph keep eating
after they swallow a single line ender, unless the next line begins
with special characters (""oneOf beginners"", as above)

> parEnd :: Parser ()
> parEnd = try (do count 1 anEnder -- count 2 anEnder
>                  skipMany anEnder) <?> "two newlines, OR a newline and one of "++beginners

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

== Small simple parsers

Nothing strange. This string of text is tagged with Txt and we move on
to the next element.

> plainWordElem :: Parser Elem
> plainWordElem = (do s <- many1 $ (oneOf normalChars <?> "normal char")
>                     return $ Txt $ s) <?> "normal characters"

A single line ending. This is preserved as Txt as well.

> lineEndElem :: Parser Elem
> lineEndElem = (do c <- lineEnd
>                   return $ Txt $ " ") <?> "a line ending character" -- [c]

Multiple whitespace is treated as a single space because HTML and TeX
treat whitespace this way.

> spaceElem :: Parser Elem
> spaceElem = (do s <- many1 $ oneOf spacing
>                 return $ Txt $ " ") <?> "whitespace"

The text begins with one of the protocols, ""http://"", ""ftp:"", etc.  and
should be treated differently than normal text.

> urlElem :: Parser Elem
> urlElem = try (do u <- protocol
>                   return $ u) <?> "a url element"

== Quasi normal text

" Deal with  single *    single /     single [              ==> Ent (Literal Char)
" vs         **bold**    //italic//   [[link and/or image]] ==> Bold, Ital, RawLnk

One //quasiNormal// character is normal and gets returned as a a
Literal Entity. Two //quasiNormal// chars seen sequentially opens an
Ital, Bold, or RawLnk element and a new "many1 piece" parser.

> quasiNormalCharElem :: Parser Elem
> quasiNormalCharElem = try (do c <- oneOf quasiNormal
>                               notFollowedBy (oneOf [c])
>                               return $ Ent [c])
>            <?> ("only one of "++quasiNormal)

Pulled from the markuppers below, both opening and closing strings,
this is a simple of list of quasiNormal characters. e.g. ""/*["_$""

> quasiNormal :: [Char]
> quasiNormal = flattenUnion $ openerStrs ++ map t2of5 allMarkuppers

== Markup!

" //italic// **bold** [[link and/or image]]
" $$math$$ ""typewriter + verbatim"" __foot note__

> type OpenerString = String
> type CloserString = String
> type Markupper = (OpenerString, CloserString,
>                   PState -> Elems -> Elem,
>                   [String] -> Parser Elems,
>                   PState->PState)

Deleting ""s"" from the current list of valid markuppers at this level
makes it so that for example,

" // followed later by text and then another // then more text

treats the second ""//"" as a closing delimiter instead of an opener,
which would recurse infinitely...

> deleter s = \os -> many1 $ pieceMaker $ delete s os

> allMarkuppers = [ ("//", "//", \_->Ital, deleter "//", id),
>                   ("**", "**", \_->Bold, deleter "**", id),
>                   ("<<", ">>", \_->VarInt, \_ -> varInterpParser, id),
>                   ("[[", "]]", \_-> (lnkRawToReal . RawLnk . stripWs), deleter "[[", id),
>                   ("\"\"", "\"\"", \_->TT,   \_ -> verbatimParser, id ),
>                   ("__", "__",  Fnote,   deleter "__", (+1)) ]

Work on these some more:
"                  ("$$", "$$", \_->RawLnk, deleter "$$", id)

Pull out all the opener strings so we can iterate over them in
""markupElem"":

> openerStrs :: [OpenerString]
> openerStrs = map t1of5 allMarkuppers

Take an opener string and return its corresponding (1) closer string,
(2) markupTag constructor, or (3) middleParser custom innards
recursive parser.

> matchingCloser :: OpenerString -> CloserString
> matchingCloser s = case find (\x -> t1of5 x == s) allMarkuppers
>                    of Nothing  -> error $ "Bad markupper opening string: " ++ s
>                       (Just x) -> t2of5 x
> markupTag :: OpenerString -> (PState -> Elems -> Elem)
> markupTag s = case find (\x -> t1of5 x == s) allMarkuppers
>               of Nothing  -> error $ "Bad markupper opening string: " ++ s
>                  (Just x) -> t3of5 x
> middleParser :: OpenerString -> [String] -> Parser Elems
> middleParser s os = case find (\x -> t1of5 x == s) allMarkuppers of
>                        Nothing  -> error $ "Bad markupper opening string: " ++ s
>                        (Just x) -> (t4of5 x) os
> stateUpdater :: OpenerString -> (PState -> PState)
> stateUpdater s = case find (\x -> t1of5 x == s) allMarkuppers of
>                     Nothing  -> error $ "Bad markupper opening string: " ++ s
>                     (Just x) -> t5of5 x

//pieceMaker// takes a list of Strings that describe possible valid
markup openers for recursing on nested sub-paragraphs (using
//markupElem//) and returns a Parser Elem that can be used with many1
to give you "many pieces"

> markupElem :: [OpenerString] -> Parser Elem
> markupElem openers = (do s <- try $ choice $ map string openers
>                          updateState $ stateUpdater s
>                          ps <- middleParser s openers
>                          close <- option "" $ string $ matchingCloser s
>                          n <- getState
>                          return $ (markupTag s) n $ ps) <?> "markedup text"

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%

== Entities

to deal with non-normal text low level elements, even some
kinds of ASCII text: the characters that break TeX or HTML we escape
ourselves this way. For example, We need entities like ""< & + $ % >
(~ or nbsp) _ # \"" to be escpaed for either TeX or HTML. Plus we can
add special characters or accent characters too.

> multicharEntityElem :: Parser Elem
> multicharEntityElem = choice $ map specialMaker specialEntSeqs

> singlecharEntityElem :: Parser Elem
> singlecharEntityElem = choice $ map normalMaker escapeNeededChars

Sort the list such that the longest strings are first (so longer
matches occur before shorter ones), i.e. "example" matches before "ex"
but if it fails, "ex" can then try and match. The other way around,
"example" would never match.

> specialEntSeqs = sortBy cmpLengthFst list
>   where { list =
>     [ ("~~",     "nbsp"),  -- a non breaking space, ~ in TeX and &nbsp; in HTML
>       ("`",      "lsquo"), -- left and right single quote (back tick and apostrophe, pretty much)
>       ("'",      "rsquo"), --  `        '
>       ("``",     "ldquo"), -- left and right double quotes
>       ("''",     "rdquo"), --  ``       ''
>       ("--",     "ndash"), -- a dash a little longer than a hyphen, expressing range, e.g. pp. 4--5.
>       ("---",    "mdash"), -- even longer than that, expressing---if you can believe it---caesura
>       ("->",     "rarr"),  -- some arrows for convenience
>       ("<-",     "larr"),
>       ("=>",     "rArr"),
>       ("<=",     "lArr"),
>       ("-->",    "rarr"),
>       ("<--",    "larr"),
>       ("==>",    "rArr"),
>       ("<==",    "lArr"),
>       ("\\ae\\", "aelig"), -- aelig i.e. a and e tied together as one character
>       ("\\AE\\", "AElig"), -- AElig i.e. A and E tied together as one character, uppercase
>       ("\\~n\\", "ntilde"),-- ~n "enye" the letter in Espa\~n\ol
>       ("\\!\\",  "iexcl")  -- \!\ay! inverted exclamation point
>     ] ++ map umlmaker "aAeEiIoOuUyY" -- \"X\ where X is a vowel -- make umlauted vowels
>           where { umlmaker x = ("\\\""++[x]++"\\", [x]++"uml") } }
>
> cmpLengthFst x y = compare (length (fst y)) (length (fst x))
> specialMaker (s,tag) = try $ do { s <- string s; return $ Ent tag }

> escapeNeededChars = "~%@#+-=<|>&" -- $_/*[]\" --- this is not true: quasiNormal takes care of $, _ etc.

> normalMaker :: Char -> Parser Elem
> normalMaker c = do { s <- string [c]; return $ Ent [c] }

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%

Blessed are the pieceMakers...

> pieceMaker :: [OpenerString] -> Parser Elem
> pieceMaker openers = (choice [ spaceElem, urlElem, plainWordElem,
>                                markupElem openers,
>                                multicharEntityElem,
>                                quasiNormalCharElem,
>                                singlecharEntityElem,
>                                lineEndElem ]) <?> "a low level element (possible fix: "++
>                                                   "close markup before paragraph or line ending)"

""<<identifier>>"" is a sort of template replacement to stick the
document represented by ""identifier"" from the environment into the
document at this point.  ""varInterpParser"" parses strings of
characters satisfying:

> varInterpParser :: Parser Elems
> varInterpParser = do c <- oneOf alpha
>                      s <- many1 $ oneOf $ alpha++digits++"_"
>                      return$ [ Txt $ [c]++s ]

Verbatim text is contained in ""double-quotes"" and inside the doulbe
quotes, one quote character is considered automatically escaped. The
only characters not literally quoted are quote and backslash. ""\\\"""
is interpreted as a literal quote and backslash-backslash, (""\\\\"")
which is treated as a single backslash, e.g. ""\\\"\\\""" --> ""\"\"""

> aDoubleQuote :: Parser Char
> aDoubleQuote = char '"'

> verbatimParser :: Parser Elems
> verbatimParser = do es <- many1 $ choice [ verbLitParser, onlyOneQuote, lineEndElem, verbEscapeSeq ]
>                     return es

> onlyOneQuote, verbLitParser, verbEscapeSeq :: Parser Elem

> verbEscapeSeq = do char '\\'
>                    c <- oneOf "\\\"" -- either backslash or quote
>                    return $ Txt [c]

> onlyOneQuote = try (do c <- aDoubleQuote
>                        notFollowedBy aDoubleQuote
>                        return $ Txt [c]) <?> "exactly one double quote character"

> verbLitParser = (do s <- many1 $ verbLitCharParser
>                     return $ Txt $ htmlEntityEncode s)
>                     <?> "at least one non-quote, non-backslash, non-new-line character."
> verbLitCharParser :: Parser Char
> verbLitCharParser = (oneOf $ verbLitChars)
>                     <?> "any character except double-quote, backslash, or new line"

> verbLitChars = spacing++normalChars++"{}^|<>[]%:/?#@!$&`'()*+,;=-._~"

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

Clean up a paragraph by stripping leading and trailing whitespace off
of Txt elems, but only in certain situations (e.g. ""[[ http://abc  ]]"")

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

== Special kinds of paragraphs:
= headings, lists/outlines, block quotes, code, comments

" =   headings
" #   list/outline with numbers. explicit numbers allow arbitrary count start (later to be done)
" -   list/outline with bullets
" *   list/outline with bullets (how is this different?)
" :   list/outline with no bullets (each : is one identation)
"     indented with spaces or tabs = (blockquote) paragraph. 
" >   code
" <   code
" %   comment
" "   pre-formatted, literal

Normal beginners

> normalBegs = spacing ++ "=#*-:"
> beginners = literalBegs ++ normalBegs

> beginnerStr :: Parser (Char,Int)
> beginnerStr = do seq <- choice beginnerSeqs
>                  option "" (many1 $ oneOf spacing)
>                  return $ (head seq, length seq)

> beginnerSeqs :: [Parser String]
> beginnerSeqs = map (many1 . char) normalBegs

> normalBeginnerPar :: Parser TopElem
> normalBeginnerPar = try $
>     do (kind,number) <- beginnerStr
>        es <- many paragraphParts -- this is normal paragraph innards
>        return $ RawBeg kind number es

The characters ""%"", "">"", ""\""" and ""<"" code blocks read all
characters literally and don't try to treat code or comments as
marked-up text! They also have two addtional requirements: (1) use
only one of them, and (2) put a single space after them, i.e. ""% comment""

> literalBegs = "%<>\""

> literalSeqs :: [Parser String]
> literalSeqs = map (many1 . char) literalBegs

> literalBegStr :: Parser (Char,Int)
> literalBegStr = do seq <- choice literalSeqs
>                    oneOf spacing
>                    return $ (head seq, length seq)

> literalBeginnerPar = try $
>     do (kind,number) <- literalBegStr
>        es <- literalPar -- this parser reads all text literally
>        return $ RawBeg kind number es

> specialBeginnerPar :: Parser TopElem
> specialBeginnerPar = choice [literalBeginnerPar, normalBeginnerPar]

> literalPar :: Parser Elems
> literalPar = do cs <- many $ noneOf enders
>                 return $ [Txt cs]

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

> simplePar :: Parser TopElem
> simplePar = do ps <- many1 paragraphParts
>                return $ Par ps
>
> paragraphParts :: Parser Elem
> paragraphParts = pieceMaker openerStrs

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

=== Putting it all together

== Paragraphs

> topLevelPar :: Parser TopElem
> topLevelPar = choice [specialBeginnerPar, simplePar]

Paragraphs separated by parEnd tokens form a Document

> aDoc :: Parser Document
> aDoc = (topLevelPar `sepEndBy` parEnd) <?>
>        "top level paragraphs separated by paragraph endings "++
>        "(usually two or more newlines)"

""doc"" of a string is the parsed structured document from the input
string or a little document telling where the error was if it fails

> doc :: String -> Document
> doc input = case runParser aDoc emptyState "" (filter (/='\r') input) of
>               Left err -> [Par [Txt $ "Parse error at " ++ (show err)]]
>               Right ps -> postParParse $ filter nonEmpty ps
>                   where nonEmpty (Par p)        = p /= []
>                         nonEmpty (RawBeg _ _ p) = p /= []
>                         nonEmpty (SpecPar _ ps) = ps/= [] --TODO filter sublists

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

Post-process and collect up like TopElem paragraphs and specialize
them, getting rid of all ""RawBeg"" TopElems

> postParParse :: [TopElem] -> [TopElem]
> postParParse = groupSquash parEq squashTopElems

Group-and-squasher utility function:

< groupSquash :: (a->a->Bool) -> ([a]->[a]) -> [a] -> [a]
< groupSquash eq squasher items =
<     foldr1 (++) $ map squasher $ groupBy eq items

Example usage of this sweet utility function:

< let squasher xs = case head xs of
<                     's' -> "$"
<                     'p' -> map (const 'q') xs
<                     _   -> xs
<   in
<      groupSquash (==) squasher "Mississippi"

" ==> Mi$i$iqqi

Now back to the Document. Are two TopElems alike?

> parEq :: TopElem -> TopElem -> Bool
> parEq (Par _)        (Par _)        = True
> parEq (RawBeg _ _ _) (RawBeg _ _ _) = True
> parEq _              _              = False

Given a list of TopElems of one type (i.e. all Pars or Begs), flatten
the Begs and return the Pars or Pres as is

> squashTopElems :: [TopElem] -> [TopElem]
> squashTopElems ps =
>     case head ps of
>       (Par _)        -> ps
>       (RawBeg _ _ _) -> groupSquash begEq begSquash ps
>       _              -> error$ "Unsupported"

Group and squash RawBeg-type TopElems, again with the comparison and
squasher functions.

> begEq :: TopElem -> TopElem -> Bool
> begEq (RawBeg '='  i _) (RawBeg '='  i' _) = i==i' -- heading nesting levels are different blocks
> begEq (RawBeg '%'  i _) (RawBeg '%'  i' _) = i==i' -- comments %% and %%% are different blocks
> begEq (RawBeg '<'  i _) (RawBeg '<'  i' _) = i==i' -- code blocks < and << are different blocks
> begEq (RawBeg '>'  i _) (RawBeg '>'  i' _) = i==i' -- code blocks > and >> are different blocks
> begEq (RawBeg ' '  i _) (RawBeg ' '  i' _) = False --i==i' -- ?block quotes ' ' and '  ' are diff't blocks?
> begEq (RawBeg '\t' i _) (RawBeg '\t' i' _) = i==i' -- ditto
> begEq (RawBeg c   _  _) (RawBeg c'  _   _) = c==c' -- everything else: nesting won't matter for eq
> begEq _                 _                  = False

Given a list of Begs of one type (i.e. all same char), flatten the
Begs into a Head1, etc. elem

> begSquash :: [TopElem] -> [TopElem]
> begSquash ps =
>    let chopped = map extractEs ps -- # * - : where nesting does something
>        outline = pairsToOutline $ map extractNestEs ps -- where we need the number value:
>                                                        -- =, %, <, >, space don't
>    in case head ps of
>       (RawBeg ' ' _ _) -> [SpecPar BQuote chopped]
>       (RawBeg '<' _ _) -> [SpecPar Code chopped]
>       (RawBeg '>' _ _) -> [SpecPar Code chopped]
>       (RawBeg '%' _ _) -> [SpecPar Comment chopped]
>       (RawBeg '"' _ _) -> [SpecPar Pre chopped]

>       (RawBeg '=' 1 _) -> [SpecPar Head1 chopped]
>       (RawBeg '=' 2 _) -> [SpecPar Head2 chopped]
>       (RawBeg '=' 3 _) -> [SpecPar Head3 chopped]
>       (RawBeg '=' 4 _) -> [SpecPar Head4 chopped]
>       (RawBeg '=' i _) -> error$ "Head level of 1,2,3, or 4 expected. Found head"++show i++"."

>       (RawBeg '-' _ _) -> [TreePar BList outline]
>       (RawBeg '*' _ _) -> [TreePar SList outline]
>       (RawBeg ':' _ _) -> [TreePar Poetry outline]
>       (RawBeg '#' _ _) -> [TreePar NList outline]
>       (RawBeg _   _ _) -> error$ "Unsupported type of special paragraph: " ++ show ps

> extractEs :: TopElem -> Elems
> extractEs (RawBeg _ _ es) = es
> extractEs _               = error "We are only dealing with RawBegs here. This shouldn't happen."

> extractNestEs :: TopElem -> (Int, Elems)
> extractNestEs (RawBeg _ i es) = (i, es)
> extractNestEs _               = error "We are only dealing with RawBegs here. This shouldn't happen."

> pairsToOutline :: [(Int, Elems)] -> (Tree Elems)
> pairsToOutline = pairListToTree

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

=== Restructure RawLnks

Turn the handy simple img/lnk notation into sensibly structured data

== Flatten Elems
= into a string for use as AltTxt to an image

> altTextOf :: Elems -> Maybe AltTxt
> altTextOf es = case toRawString es of { "" -> Nothing ; x -> Just x }

> toRawString :: (Data a, Typeable a) => a -> String
> toRawString e = strip $ stringize e

Collect and concatenate all strings in the given document or element
or toplevel or whatever!

> stringize :: (Data a, Typeable a) => a -> String
> stringize t = str t
>     where str :: (Data a, Typeable a) => a -> String
>           str s = case cast s of
>                     Just s  -> s -- stop recursion on a string, hopefully
>                     Nothing -> findLits s
>           findLits :: (Typeable a, Data a) => a -> String
>           findLits t = case cast_ent t of -- render literals literally!
>                          Just ent -> case length ent of { 1 -> ent; _ -> entToString ent }
>                          Nothing -> concat (gmapQ stringize t) -- recurse and look for strings

>           cast_ent  :: (Typeable a) => a -> Maybe String
>           cast_ent e = case cast_elem e of
>                         Just elm -> pulloutEnt elm
>                         Nothing -> Nothing
>           cast_elem :: (Typeable a) => a -> Maybe Elem
>           cast_elem = cast
>           pulloutEnt :: Elem -> Maybe String
>           pulloutEnt (Ent e) = Just e
>           pulloutEnt _       = Nothing

> entToString :: Entity -> String
> entToString e =
>     case revlookup e specialEntSeqs of
>       Nothing -> error$ "Unrecognized entity in attempt to convert to raw text: "++ show e
>       Just s -> s

Look up a key in a [(value, key)] environment (specialEntSeqs is
shaped this way)

> revlookup :: Eq a => a -> [(b, a)] -> Maybe b
> revlookup k env = lookup k flipenv
>     where flipenv = map flip2tup env
>           flip2tup (a,b) = (b,a)

== Strip phantom protocol

Throw away the made up protocols such as ""lnk:""

> phantomStrip :: Url -> Url
> phantomStrip (Url p u) = if p `elem` phantomProtocols then (Url "" u) else (Url p u)

> unphantom :: ProtocolStr -> ProtocolStr
> unphantom p = if p `elem` phantomProtocols then "" else p

Pull out possible img or target for a link

> imgOfUrl, targetUrl :: Elem -> Maybe Url
> imgOfUrl (UrlElem (Url p u)) = if p `elem` imageProtocols then Just (Url "" u) else Nothing
> imgOfUrl _ = Nothing
> targetUrl (UrlElem (Url p u)) = if p `elem` linktoProtocols then Just (Url p u) else Nothing
> targetUrl _ = Nothing

Ensure that a given element is not a UrlElem so we can pass that along
unaltered.

> nonUrl :: Elem -> Elems -- singleton or empty list, i.e. [e] or []
> nonUrl (UrlElem _) = []
> nonUrl e           = [e]

Convert a RawLnk to Img or Lnk, if possible

> lnkRawToReal = realLnkToElem . rawLnkToLnk

Collect up relevant information in a link grouping. We automate this
so that (1) we don't have to be too verbose, and (2) we don't have to
remember the order of "arguments". All combinations should "do the
right thing".

> rawLnkToLnk :: Elem -> Elem
> rawLnkToLnk (RawLnk es) = realLnk es [] Nothing Nothing
> rawLnkToLnk x = x -- do nothing to non-RawLnk elements
>
> realLnk :: Elems -> Elems -> (Maybe Url) -> (Maybe Url) -> Elem
> -- ran out of input elements, see what we have accumulated as im or target
> realLnk [] es im target =
>     case im of
>     { Nothing -> case target of Nothing -> RawLnk es
>                                 Just t  -> let t'@(Url p u) = phantomStrip t in
>                                               if es /= [] then Lnk t' es
>                                               else Lnk t' [Txt (p++u)]
>     ; Just i -> Img i target (altTextOf es) }
> -- see what the head of input elements is and update our im or target accumulator
> realLnk (e:es) tailEs im target =
>     let im'   = imgOfUrl e
>         targ' = case im' of Nothing -> targetUrl e
>                             _       -> Nothing
>         e'    = nonUrl e -- singleton or empty list, i.e. [e] or []
>     in realLnk es (tailEs ++ e') (im `mplus` im') (target `mplus` targ')

Finish processing the ""Elem""s with Urls, Imgs, or Lnks to be in final form.

> realLnkToElem :: Elem -> Elem
> realLnkToElem (UrlElem url) =
>     let (Url p u) = url in
>         if p `elem` imageProtocols then Img (Url "" u) Nothing Nothing  --"<img src=\""++u++"\" />"
>         else let (Url p' u') = phantomStrip url in UrlElem (Url p' u')
>                  --"<a href=\""++p'++u'++"\">" ++p'++u'++ "</a>"
> realLnkToElem (Img (Url _ i) targ a) =
> --  let altt = case a of Nothing -> ""
> --                       Just s  -> "alt=\""++s++"\"" in
>     case targ of Nothing -> Img (Url "" i) Nothing a --"<img src=\"" ++ i ++ "\" "++ altt ++" />"
>                  Just (Url p u) -> Img (Url "" i) (Just (Url (unphantom p) u)) a
>                                    -- "<a href=\""++unphantom p++u++"\">"++
>                                    -- "<img border=\"0\" src=\"" ++ i ++ "\" "++
>                                    -- altt ++" /></a>"
> realLnkToElem (Lnk (Url p u) es) = Lnk (Url p u) (stripWs es)
> --    "<a href=\"" ++p++u++ "\">" ++ strip (elemsToHtmlString es) ++ "</a>"
> realLnkToElem x = x -- identity on everything else

%%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% 

> test p s = runParser p emptyState "" s

Example usage:

< test entityElem "<--"