==== MML ==== A Wiki-like Minimal Markup Language %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% === Parser module %if False > {-# OPTIONS_GHC -fglasgow-exts #-} > module Mmlparse where > import Text.ParserCombinators.Parsec hiding ( Parser ) > import Text.ParserCombinators.Parsec.Prim hiding ( Parser ) > import List > import Monad > import Data.Generics > > import Doc > import Doc2Html > import HTMLEnt > import GroupSquash ( groupSquash ) > > import ListToTree %endif > type Parser a = GenParser Char PState a > type PState = Integer > emptyState :: PState > emptyState = 0 Simple helper function to flatten a list of lists and remove duplicates, e.g. ""["//", "**", "[[", "]]"] --> "/*[]" "" > flattenUnion :: (Eq a) => [[a]] -> [a] > flattenUnion = nub . concat %if False 5-tuple pattern matcher helper functions (so we can pass them around without having to define lambdas everywhere) > t1of5 :: (a,b,c,d,e) -> a > t2of5 :: (a,b,c,d,e) -> b > t3of5 :: (a,b,c,d,e) -> c > t4of5 :: (a,b,c,d,e) -> d > t5of5 :: (a,b,c,d,e) -> e > t1of5 (a,_,_,_,_) = a > t2of5 (_,b,_,_,_) = b > t3of5 (_,_,c,_,_) = c > t4of5 (_,_,_,d,_) = d > t5of5 (_,_,_,_,e) = e %endif %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% === Basic parsers == Normal text Alphanumeric plus certain puncuation are consider normal, non-special characters in both TeX and HTML, but everything else does something weird so we have to deal with it separately > lowercase = ['a'..'z'] > uppercase = ['A'..'Z'] > alpha = lowercase ++ uppercase > digits = ['0'..'9'] > spacing = " \t" > normalPunct = "!?.,:;()" -- for now... > normalChars = alpha ++ digits ++ normalPunct -- ++ spacing %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% == Urls, Links, Images What does the parser recognize as a url and not just as normal text? The phantom protocol `lnk' is useful for relative links and links to images, etc. > normalProtocols = words "https:// http:// ftp:// mailto: file:/// urn: ldap:// news: telnet://" > phantomProtocols = words "lnk:" -- later deleted away > linktoProtocols = phantomProtocols ++ normalProtocols == ""img:"" Specifies that a url is to be treated as an image in the html `img src' sense. This protocol is a phantom protocol as well, meaning that it will get deleted later. You could just as well have a compound url like this: ""img:http://www.google.com/images/logo_sm.gif"" > imageProtocols = words "img:" These are all the currently supported protocols, but it is trivial to add new ones. > protocols = imageProtocols ++ linktoProtocols > protocol :: Parser Elem > protocol = do p <- choice $ map (try . string) protocols > url <- many1 $ oneOf validUrlChars > return $ UrlElem $ Url p url What characters are valid in a url? Almost anything on the keyboard but spaces and ""{}^"|<>[]"" (""[]()"" may normally be valid but that is bad since we want to use those to group links, plus delimiters are not ``naturally'' part of a URL). From p.12 of http://www.ietf.org/rfc/rfc3986.txt : > validUrlChars = "%:,/?#@!$&'*+=-._~" ++ alpha ++ digits %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% == Paragraphs and line endings Newline characters (or other line enders, if necessary) > enders :: [Char] > enders = "\n" > anEnder :: Parser Char > anEnder = (oneOf enders "a line ending character") A solitary line ending character keeps everything in the same paragraph; the newline character is returned. This line ender should not be followed by any ``special beginner'' characters (bullets in a list, etc.) > lineEnd :: Parser Char > lineEnd = try (do c <- anEnder > notFollowedBy $ choice [anEnder, oneOf (filter (`notElem` spacing) beginners)] > return c) > "one line ending character, no special beginner chars following" One or more line ending characters mark a new paragraph. This assumes that the parsers that eat the internals of the paragraph keep eating after they swallow a single line ender, unless the next line begins with special characters (""oneOf beginners"", as above) > parEnd :: Parser () > parEnd = try (do count 1 anEnder -- count 2 anEnder > skipMany anEnder) "two newlines, OR a newline and one of "++beginners %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% == Small simple parsers Nothing strange. This string of text is tagged with Txt and we move on to the next element. > plainWordElem :: Parser Elem > plainWordElem = (do s <- many1 $ (oneOf normalChars "normal char") > return $ Txt $ s) "normal characters" A single line ending. This is preserved as Txt as well. > lineEndElem :: Parser Elem > lineEndElem = (do c <- lineEnd > return $ Txt $ " ") "a line ending character" -- [c] Multiple whitespace is treated as a single space because HTML and TeX treat whitespace this way. > spaceElem :: Parser Elem > spaceElem = (do s <- many1 $ oneOf spacing > return $ Txt $ " ") "whitespace" The text begins with one of the protocols, ""http://"", ""ftp:"", etc. and should be treated differently than normal text. > urlElem :: Parser Elem > urlElem = try (do u <- protocol > return $ u) "a url element" == Quasi normal text " Deal with single * single / single [ ==> Ent (Literal Char) " vs **bold** //italic// [[link and/or image]] ==> Bold, Ital, RawLnk One //quasiNormal// character is normal and gets returned as a a Literal Entity. Two //quasiNormal// chars seen sequentially opens an Ital, Bold, or RawLnk element and a new "many1 piece" parser. > quasiNormalCharElem :: Parser Elem > quasiNormalCharElem = try (do c <- oneOf quasiNormal > notFollowedBy (oneOf [c]) > return $ Ent [c]) > ("only one of "++quasiNormal) Pulled from the markuppers below, both opening and closing strings, this is a simple of list of quasiNormal characters. e.g. ""/*["_$"" > quasiNormal :: [Char] > quasiNormal = flattenUnion $ openerStrs ++ map t2of5 allMarkuppers == Markup! " //italic// **bold** [[link and/or image]] " $$math$$ ""typewriter + verbatim"" __foot note__ > type OpenerString = String > type CloserString = String > type Markupper = (OpenerString, CloserString, > PState -> Elems -> Elem, > [String] -> Parser Elems, > PState->PState) Deleting ""s"" from the current list of valid markuppers at this level makes it so that for example, " // followed later by text and then another // then more text treats the second ""//"" as a closing delimiter instead of an opener, which would recurse infinitely... > deleter s = \os -> many1 $ pieceMaker $ delete s os > allMarkuppers = [ ("//", "//", \_->Ital, deleter "//", id), > ("**", "**", \_->Bold, deleter "**", id), > ("<<", ">>", \_->VarInt, \_ -> varInterpParser, id), > ("[[", "]]", \_-> (lnkRawToReal . RawLnk . stripWs), deleter "[[", id), > ("\"\"", "\"\"", \_->TT, \_ -> verbatimParser, id ), > ("__", "__", Fnote, deleter "__", (+1)) ] Work on these some more: " ("$$", "$$", \_->RawLnk, deleter "$$", id) Pull out all the opener strings so we can iterate over them in ""markupElem"": > openerStrs :: [OpenerString] > openerStrs = map t1of5 allMarkuppers Take an opener string and return its corresponding (1) closer string, (2) markupTag constructor, or (3) middleParser custom innards recursive parser. > matchingCloser :: OpenerString -> CloserString > matchingCloser s = case find (\x -> t1of5 x == s) allMarkuppers > of Nothing -> error $ "Bad markupper opening string: " ++ s > (Just x) -> t2of5 x > markupTag :: OpenerString -> (PState -> Elems -> Elem) > markupTag s = case find (\x -> t1of5 x == s) allMarkuppers > of Nothing -> error $ "Bad markupper opening string: " ++ s > (Just x) -> t3of5 x > middleParser :: OpenerString -> [String] -> Parser Elems > middleParser s os = case find (\x -> t1of5 x == s) allMarkuppers of > Nothing -> error $ "Bad markupper opening string: " ++ s > (Just x) -> (t4of5 x) os > stateUpdater :: OpenerString -> (PState -> PState) > stateUpdater s = case find (\x -> t1of5 x == s) allMarkuppers of > Nothing -> error $ "Bad markupper opening string: " ++ s > (Just x) -> t5of5 x //pieceMaker// takes a list of Strings that describe possible valid markup openers for recursing on nested sub-paragraphs (using //markupElem//) and returns a Parser Elem that can be used with many1 to give you "many pieces" > markupElem :: [OpenerString] -> Parser Elem > markupElem openers = (do s <- try $ choice $ map string openers > updateState $ stateUpdater s > ps <- middleParser s openers > close <- option "" $ string $ matchingCloser s > n <- getState > return $ (markupTag s) n $ ps) "markedup text" %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%% == Entities to deal with non-normal text low level elements, even some kinds of ASCII text: the characters that break TeX or HTML we escape ourselves this way. For example, We need entities like ""< & + $ % > (~ or nbsp) _ # \"" to be escpaed for either TeX or HTML. Plus we can add special characters or accent characters too. > multicharEntityElem :: Parser Elem > multicharEntityElem = choice $ map specialMaker specialEntSeqs > singlecharEntityElem :: Parser Elem > singlecharEntityElem = choice $ map normalMaker escapeNeededChars Sort the list such that the longest strings are first (so longer matches occur before shorter ones), i.e. "example" matches before "ex" but if it fails, "ex" can then try and match. The other way around, "example" would never match. > specialEntSeqs = sortBy cmpLengthFst list > where { list = > [ ("~~", "nbsp"), -- a non breaking space, ~ in TeX and   in HTML > ("`", "lsquo"), -- left and right single quote (back tick and apostrophe, pretty much) > ("'", "rsquo"), -- ` ' > ("``", "ldquo"), -- left and right double quotes > ("''", "rdquo"), -- `` '' > ("--", "ndash"), -- a dash a little longer than a hyphen, expressing range, e.g. pp. 4--5. > ("---", "mdash"), -- even longer than that, expressing---if you can believe it---caesura > ("->", "rarr"), -- some arrows for convenience > ("<-", "larr"), > ("=>", "rArr"), > ("<=", "lArr"), > ("-->", "rarr"), > ("<--", "larr"), > ("==>", "rArr"), > ("<==", "lArr"), > ("\\ae\\", "aelig"), -- aelig i.e. a and e tied together as one character > ("\\AE\\", "AElig"), -- AElig i.e. A and E tied together as one character, uppercase > ("\\~n\\", "ntilde"),-- ~n "enye" the letter in Espa\~n\ol > ("\\!\\", "iexcl") -- \!\ay! inverted exclamation point > ] ++ map umlmaker "aAeEiIoOuUyY" -- \"X\ where X is a vowel -- make umlauted vowels > where { umlmaker x = ("\\\""++[x]++"\\", [x]++"uml") } } > > cmpLengthFst x y = compare (length (fst y)) (length (fst x)) > specialMaker (s,tag) = try $ do { s <- string s; return $ Ent tag } > escapeNeededChars = "~%@#+-=<|>&" -- $_/*[]\" --- this is not true: quasiNormal takes care of $, _ etc. > normalMaker :: Char -> Parser Elem > normalMaker c = do { s <- string [c]; return $ Ent [c] } %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%% Blessed are the pieceMakers... > pieceMaker :: [OpenerString] -> Parser Elem > pieceMaker openers = (choice [ spaceElem, urlElem, plainWordElem, > markupElem openers, > multicharEntityElem, > quasiNormalCharElem, > singlecharEntityElem, > lineEndElem ]) "a low level element (possible fix: "++ > "close markup before paragraph or line ending)" ""<>"" is a sort of template replacement to stick the document represented by ""identifier"" from the environment into the document at this point. ""varInterpParser"" parses strings of characters satisfying: > varInterpParser :: Parser Elems > varInterpParser = do c <- oneOf alpha > s <- many1 $ oneOf $ alpha++digits++"_" > return$ [ Txt $ [c]++s ] Verbatim text is contained in ""double-quotes"" and inside the doulbe quotes, one quote character is considered automatically escaped. The only characters not literally quoted are quote and backslash. ""\\\""" is interpreted as a literal quote and backslash-backslash, (""\\\\"") which is treated as a single backslash, e.g. ""\\\"\\\""" --> ""\"\""" > aDoubleQuote :: Parser Char > aDoubleQuote = char '"' > verbatimParser :: Parser Elems > verbatimParser = do es <- many1 $ choice [ verbLitParser, onlyOneQuote, lineEndElem, verbEscapeSeq ] > return es > onlyOneQuote, verbLitParser, verbEscapeSeq :: Parser Elem > verbEscapeSeq = do char '\\' > c <- oneOf "\\\"" -- either backslash or quote > return $ Txt [c] > onlyOneQuote = try (do c <- aDoubleQuote > notFollowedBy aDoubleQuote > return $ Txt [c]) "exactly one double quote character" > verbLitParser = (do s <- many1 $ verbLitCharParser > return $ Txt $ htmlEntityEncode s) > "at least one non-quote, non-backslash, non-new-line character." > verbLitCharParser :: Parser Char > verbLitCharParser = (oneOf $ verbLitChars) > "any character except double-quote, backslash, or new line" > verbLitChars = spacing++normalChars++"{}^|<>[]%:/?#@!$&`'()*+,;=-._~" %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% Clean up a paragraph by stripping leading and trailing whitespace off of Txt elems, but only in certain situations (e.g. ""[[ http://abc ]]"") %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% == Special kinds of paragraphs: = headings, lists/outlines, block quotes, code, comments " = headings " # list/outline with numbers. explicit numbers allow arbitrary count start (later to be done) " - list/outline with bullets " * list/outline with bullets (how is this different?) " : list/outline with no bullets (each : is one identation) " indented with spaces or tabs = (blockquote) paragraph. " > code " < code " % comment " " pre-formatted, literal Normal beginners > normalBegs = spacing ++ "=#*-:" > beginners = literalBegs ++ normalBegs > beginnerStr :: Parser (Char,Int) > beginnerStr = do seq <- choice beginnerSeqs > option "" (many1 $ oneOf spacing) > return $ (head seq, length seq) > beginnerSeqs :: [Parser String] > beginnerSeqs = map (many1 . char) normalBegs > normalBeginnerPar :: Parser TopElem > normalBeginnerPar = try $ > do (kind,number) <- beginnerStr > es <- many paragraphParts -- this is normal paragraph innards > return $ RawBeg kind number es The characters ""%"", "">"", ""\""" and ""<"" code blocks read all characters literally and don't try to treat code or comments as marked-up text! They also have two addtional requirements: (1) use only one of them, and (2) put a single space after them, i.e. ""% comment"" > literalBegs = "%<>\"" > literalSeqs :: [Parser String] > literalSeqs = map (many1 . char) literalBegs > literalBegStr :: Parser (Char,Int) > literalBegStr = do seq <- choice literalSeqs > oneOf spacing > return $ (head seq, length seq) > literalBeginnerPar = try $ > do (kind,number) <- literalBegStr > es <- literalPar -- this parser reads all text literally > return $ RawBeg kind number es > specialBeginnerPar :: Parser TopElem > specialBeginnerPar = choice [literalBeginnerPar, normalBeginnerPar] > literalPar :: Parser Elems > literalPar = do cs <- many $ noneOf enders > return $ [Txt cs] %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% > simplePar :: Parser TopElem > simplePar = do ps <- many1 paragraphParts > return $ Par ps > > paragraphParts :: Parser Elem > paragraphParts = pieceMaker openerStrs %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% === Putting it all together == Paragraphs > topLevelPar :: Parser TopElem > topLevelPar = choice [specialBeginnerPar, simplePar] Paragraphs separated by parEnd tokens form a Document > aDoc :: Parser Document > aDoc = (topLevelPar `sepEndBy` parEnd) > "top level paragraphs separated by paragraph endings "++ > "(usually two or more newlines)" ""doc"" of a string is the parsed structured document from the input string or a little document telling where the error was if it fails > doc :: String -> Document > doc input = case runParser aDoc emptyState "" (filter (/='\r') input) of > Left err -> [Par [Txt $ "Parse error at " ++ (show err)]] > Right ps -> postParParse $ filter nonEmpty ps > where nonEmpty (Par p) = p /= [] > nonEmpty (RawBeg _ _ p) = p /= [] > nonEmpty (SpecPar _ ps) = ps/= [] --TODO filter sublists %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% Post-process and collect up like TopElem paragraphs and specialize them, getting rid of all ""RawBeg"" TopElems > postParParse :: [TopElem] -> [TopElem] > postParParse = groupSquash parEq squashTopElems Group-and-squasher utility function: < groupSquash :: (a->a->Bool) -> ([a]->[a]) -> [a] -> [a] < groupSquash eq squasher items = < foldr1 (++) $ map squasher $ groupBy eq items Example usage of this sweet utility function: < let squasher xs = case head xs of < 's' -> "$" < 'p' -> map (const 'q') xs < _ -> xs < in < groupSquash (==) squasher "Mississippi" " ==> Mi$i$iqqi Now back to the Document. Are two TopElems alike? > parEq :: TopElem -> TopElem -> Bool > parEq (Par _) (Par _) = True > parEq (RawBeg _ _ _) (RawBeg _ _ _) = True > parEq _ _ = False Given a list of TopElems of one type (i.e. all Pars or Begs), flatten the Begs and return the Pars or Pres as is > squashTopElems :: [TopElem] -> [TopElem] > squashTopElems ps = > case head ps of > (Par _) -> ps > (RawBeg _ _ _) -> groupSquash begEq begSquash ps > _ -> error$ "Unsupported" Group and squash RawBeg-type TopElems, again with the comparison and squasher functions. > begEq :: TopElem -> TopElem -> Bool > begEq (RawBeg '=' i _) (RawBeg '=' i' _) = i==i' -- heading nesting levels are different blocks > begEq (RawBeg '%' i _) (RawBeg '%' i' _) = i==i' -- comments %% and %%% are different blocks > begEq (RawBeg '<' i _) (RawBeg '<' i' _) = i==i' -- code blocks < and << are different blocks > begEq (RawBeg '>' i _) (RawBeg '>' i' _) = i==i' -- code blocks > and >> are different blocks > begEq (RawBeg ' ' i _) (RawBeg ' ' i' _) = False --i==i' -- ?block quotes ' ' and ' ' are diff't blocks? > begEq (RawBeg '\t' i _) (RawBeg '\t' i' _) = i==i' -- ditto > begEq (RawBeg c _ _) (RawBeg c' _ _) = c==c' -- everything else: nesting won't matter for eq > begEq _ _ = False Given a list of Begs of one type (i.e. all same char), flatten the Begs into a Head1, etc. elem > begSquash :: [TopElem] -> [TopElem] > begSquash ps = > let chopped = map extractEs ps -- # * - : where nesting does something > outline = pairsToOutline $ map extractNestEs ps -- where we need the number value: > -- =, %, <, >, space don't > in case head ps of > (RawBeg ' ' _ _) -> [SpecPar BQuote chopped] > (RawBeg '<' _ _) -> [SpecPar Code chopped] > (RawBeg '>' _ _) -> [SpecPar Code chopped] > (RawBeg '%' _ _) -> [SpecPar Comment chopped] > (RawBeg '"' _ _) -> [SpecPar Pre chopped] > (RawBeg '=' 1 _) -> [SpecPar Head1 chopped] > (RawBeg '=' 2 _) -> [SpecPar Head2 chopped] > (RawBeg '=' 3 _) -> [SpecPar Head3 chopped] > (RawBeg '=' 4 _) -> [SpecPar Head4 chopped] > (RawBeg '=' i _) -> error$ "Head level of 1,2,3, or 4 expected. Found head"++show i++"." > (RawBeg '-' _ _) -> [TreePar BList outline] > (RawBeg '*' _ _) -> [TreePar SList outline] > (RawBeg ':' _ _) -> [TreePar Poetry outline] > (RawBeg '#' _ _) -> [TreePar NList outline] > (RawBeg _ _ _) -> error$ "Unsupported type of special paragraph: " ++ show ps > extractEs :: TopElem -> Elems > extractEs (RawBeg _ _ es) = es > extractEs _ = error "We are only dealing with RawBegs here. This shouldn't happen." > extractNestEs :: TopElem -> (Int, Elems) > extractNestEs (RawBeg _ i es) = (i, es) > extractNestEs _ = error "We are only dealing with RawBegs here. This shouldn't happen." > pairsToOutline :: [(Int, Elems)] -> (Tree Elems) > pairsToOutline = pairListToTree %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% === Restructure RawLnks Turn the handy simple img/lnk notation into sensibly structured data == Flatten Elems = into a string for use as AltTxt to an image > altTextOf :: Elems -> Maybe AltTxt > altTextOf es = case toRawString es of { "" -> Nothing ; x -> Just x } > toRawString :: (Data a, Typeable a) => a -> String > toRawString e = strip $ stringize e Collect and concatenate all strings in the given document or element or toplevel or whatever! > stringize :: (Data a, Typeable a) => a -> String > stringize t = str t > where str :: (Data a, Typeable a) => a -> String > str s = case cast s of > Just s -> s -- stop recursion on a string, hopefully > Nothing -> findLits s > findLits :: (Typeable a, Data a) => a -> String > findLits t = case cast_ent t of -- render literals literally! > Just ent -> case length ent of { 1 -> ent; _ -> entToString ent } > Nothing -> concat (gmapQ stringize t) -- recurse and look for strings > cast_ent :: (Typeable a) => a -> Maybe String > cast_ent e = case cast_elem e of > Just elm -> pulloutEnt elm > Nothing -> Nothing > cast_elem :: (Typeable a) => a -> Maybe Elem > cast_elem = cast > pulloutEnt :: Elem -> Maybe String > pulloutEnt (Ent e) = Just e > pulloutEnt _ = Nothing > entToString :: Entity -> String > entToString e = > case revlookup e specialEntSeqs of > Nothing -> error$ "Unrecognized entity in attempt to convert to raw text: "++ show e > Just s -> s Look up a key in a [(value, key)] environment (specialEntSeqs is shaped this way) > revlookup :: Eq a => a -> [(b, a)] -> Maybe b > revlookup k env = lookup k flipenv > where flipenv = map flip2tup env > flip2tup (a,b) = (b,a) == Strip phantom protocol Throw away the made up protocols such as ""lnk:"" > phantomStrip :: Url -> Url > phantomStrip (Url p u) = if p `elem` phantomProtocols then (Url "" u) else (Url p u) > unphantom :: ProtocolStr -> ProtocolStr > unphantom p = if p `elem` phantomProtocols then "" else p Pull out possible img or target for a link > imgOfUrl, targetUrl :: Elem -> Maybe Url > imgOfUrl (UrlElem (Url p u)) = if p `elem` imageProtocols then Just (Url "" u) else Nothing > imgOfUrl _ = Nothing > targetUrl (UrlElem (Url p u)) = if p `elem` linktoProtocols then Just (Url p u) else Nothing > targetUrl _ = Nothing Ensure that a given element is not a UrlElem so we can pass that along unaltered. > nonUrl :: Elem -> Elems -- singleton or empty list, i.e. [e] or [] > nonUrl (UrlElem _) = [] > nonUrl e = [e] Convert a RawLnk to Img or Lnk, if possible > lnkRawToReal = realLnkToElem . rawLnkToLnk Collect up relevant information in a link grouping. We automate this so that (1) we don't have to be too verbose, and (2) we don't have to remember the order of "arguments". All combinations should "do the right thing". > rawLnkToLnk :: Elem -> Elem > rawLnkToLnk (RawLnk es) = realLnk es [] Nothing Nothing > rawLnkToLnk x = x -- do nothing to non-RawLnk elements > > realLnk :: Elems -> Elems -> (Maybe Url) -> (Maybe Url) -> Elem > -- ran out of input elements, see what we have accumulated as im or target > realLnk [] es im target = > case im of > { Nothing -> case target of Nothing -> RawLnk es > Just t -> let t'@(Url p u) = phantomStrip t in > if es /= [] then Lnk t' es > else Lnk t' [Txt (p++u)] > ; Just i -> Img i target (altTextOf es) } > -- see what the head of input elements is and update our im or target accumulator > realLnk (e:es) tailEs im target = > let im' = imgOfUrl e > targ' = case im' of Nothing -> targetUrl e > _ -> Nothing > e' = nonUrl e -- singleton or empty list, i.e. [e] or [] > in realLnk es (tailEs ++ e') (im `mplus` im') (target `mplus` targ') Finish processing the ""Elem""s with Urls, Imgs, or Lnks to be in final form. > realLnkToElem :: Elem -> Elem > realLnkToElem (UrlElem url) = > let (Url p u) = url in > if p `elem` imageProtocols then Img (Url "" u) Nothing Nothing --"" > else let (Url p' u') = phantomStrip url in UrlElem (Url p' u') > --"" ++p'++u'++ "" > realLnkToElem (Img (Url _ i) targ a) = > -- let altt = case a of Nothing -> "" > -- Just s -> "alt=\""++s++"\"" in > case targ of Nothing -> Img (Url "" i) Nothing a --"" > Just (Url p u) -> Img (Url "" i) (Just (Url (unphantom p) u)) a > -- ""++ > -- " -- altt ++" />" > realLnkToElem (Lnk (Url p u) es) = Lnk (Url p u) (stripWs es) > -- "" ++ strip (elemsToHtmlString es) ++ "" > realLnkToElem x = x -- identity on everything else %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% %%%%%%%% > test p s = runParser p emptyState "" s Example usage: < test entityElem "<--"