Wednesday April 24, 2002
I just wanted to put this up here so that I can show what I did last night in case it looks like I'm not doing anything (I'm talking to myself in the future by the way. I will forget how busy I was.)
This script txt2link.py, is used by my squidparser to makes links out of all those email addresses and urls in the squid events. Now it's new, it's improved, it's da da dum....SUPERUSEFULL!!!!
import re """ this was adapted from vagueurl.py -- regular expression to match informal URLs in plain text. This doesn't do and exact job (it doesn't parse the complete syntax of URLs) but it should find URLs that at least start right. It looks for the start of an address then gobbles up as many legal URL characters as it can find. originally by Glyn Webster <glyn@ninz.org.nz> 1999-04-27 now by dave primmer http://primco.org """ pattern = r''' ( ( \w | - | % )+ @ # email address prefix (e.g. "glyn@") | \w+ :// # or protocol prefix (e.g. "http://") | news: # or "news:" prefix (special case: no "//") | mailto: # | www \. # lazy typists leave off common prefixes | ftp \. ) # then [^\\{}|[\]^<>"'\s]* # the rest are any characters allowed in a URL. [^\\{}|[\]^<>"'\s.,;?:!] # it mustn't end in a punctuation mark or this would # match this wrong: "Www.w3.org, ftp.simtel.net." ''' def vagueurl(url): """ Add an appropiate prefix to an informal URL if a match object is passed (convert to a string). """ if re.match(r"\w+:", url): #Has prefix already, leave it alone. return url else: if re.match(r"(\w|-|%)+@", url): #Starts like an email address. return "mailto:" + url elif url[:3] == 'ftp': #Starts like an FTP address. return "ftp://" + url else: #Assume it's a WWW address. return "http://" + url def URL2htmllink(url): """ wraps html link code around url url is a match object which is converted to a string """ url = url.group(0) return '<a href="%s">%s</a>' % (vagueurl(url), url) def URL2xmllink(url): """ wraps xml link code around url url is a match object which is converted to a string """ url = url.group(0) return '<link><address>%s</address><text>%s</text></link>' % (vagueurl(url),url) def linktext(text,mode='html'): regexp = re.compile(pattern, re.IGNORECASE | re.VERBOSE) if mode == 'xml': return re.sub(regexp,URL2xmllink ,text) else:return re.sub(regexp,URL2htmllink,text) if __name__ == '__main__': sampletext = """dsik sdksils difjsk http://primco.org akd is sss.sss.www www.ddd another and nobody@nowhere.com aid """ print linktext(sampletext)
voyeurs of the world, give something back!
Nearby Entries
<prev<----
Home
----
>next>
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | |
7 | 8 | 9 | 10 | 11 | 12 | 13 |
14 | 15 | 16 | 17 | 18 | 19 | 20 |
21 | 22 | 23 | 24 | 25 | 26 | 27 |
28 | 29 | 30 |
Search
Categories
- blog meta info (23)
- essays (15)
- eyes (6)
- india (10)
- my book (6)
- movies (17)
- music (40)
- photos misc (59)
- cuba photos (24)
- india photos (52)
- san francisco photos (51)
- the mission (19)
- videos (25)
Archives
- May 2006
- October 2005
- September 2005
- May 2005
- April 2005
- March 2005
- February 2005
- November 2003
- October 2003
- September 2003
- August 2003
- July 2003
- June 2003
- May 2003
- April 2003
- March 2003
- February 2003
- January 2003
- December 2002
- November 2002
- October 2002
- September 2002
- August 2002
- July 2002
- June 2002
- May 2002
- April 2002
- March 2002
- February 2002
- January 2002
- December 2001
- November 2001
- October 2001
- September 2001
- August 2001
- July 2001
- June 2001
- May 2001
Recent Entries
- Act Of The Apostle Part 1 May 22, 2006 10:42 AM
- Listen to Thomas Friedman October 17, 2005 1:39 AM
- Personal Continuity September 6, 2005 2:28 AM
- White Gold May 17, 2005 5:35 PM
- Hate, lies and perverted racism May 1, 2005 3:33 PM
- Ballroom Chairs April 29, 2005 1:21 AM
- Supreme Court to pro-lifers -> deeeenied. March 24, 2005 11:02 AM
- Blurry SFO March 23, 2005 3:38 AM
- A site for the blind March 20, 2005 9:07 PM
- The tube is out! March 19, 2005 10:33 AM