#4320 NORM First D: Need support for hyperlinks
Zarro Boogs per Child
bugtracker at laptop.org
Tue Oct 23 22:32:56 EDT 2007
#4320: Need support for hyperlinks
----------------------------+-----------------------------------------------
Reporter: Eben | Owner: morgs
Type: enhancement | Status: reopened
Priority: normal | Milestone: First Deployment, V1.0
Component: chat-activity | Version:
Resolution: | Keywords:
Verified: 0 |
----------------------------+-----------------------------------------------
Comment(by AlbertCahalan):
Replying to [comment:6 Eben]:
> Just a note on the regexp. It doesn't yet handle trailing characters
such as periods, closing peren, commas, semicolons, etc. These characters
are likely part of the syntax of the sentence, and not part of the URL.
This seems to do the job.
{{{
egrep -99 --color
'((http|ftp)s?://)?(([-a-zA-Z0-9]+[.])+[-a-zA-Z0-9]{2,}|([0-9]{1,3}[.]){3}[0-9]{1,3})(:[1-9][0-9]{0,4})?(/[-a-zA-Z0-9/%~@&_+=;:,.?#]*[a-zA-Z0-9/])?'
}}}
There is a tradeoff to be made. In general the above errs on the side of
choosing something as a URL, so you'd get laptop.org from this sentence.
You'd not get hello.c, but sugar.py would count. (maybe "py" is a country
code top level domain) Usernames and passwords built into the URL are not
supported; they are very rare and often considered to be bad security
practice. The same goes for unescaped non-ASCII.
Breaking it down into smallish semi-readable chunks:
{{{
optional protocol part (does ftps, not sftp or irc or mailto)
((http|ftp)s?://)?
fully-qualified names and IPv4 addresses are accepted
(([-a-zA-Z0-9]+[.])+[-a-zA-Z0-9]{2,}|([0-9]{1,3}[.]){3}[0-9]{1,3})
port numbers are decimal, 1 to 5 digits (accepts 99999 but not 0377)
(:[1-9][0-9]{0,4})?
this gets the rest, disallowing some trailing puctuation
(/[-a-zA-Z0-9/%~@&_+=;:,.?#]*[a-zA-Z0-9/])?'
}}}
--
Ticket URL: <https://dev.laptop.org/ticket/4320#comment:7>
One Laptop Per Child <https://dev.laptop.org>
OLPC bug tracking system
More information about the Bugs
mailing list