您的位置:程序门 -> .net技术 -> asp.net



求正则,开始没把问题说清楚,抱歉,请大家别介意  取html代码的连接


[收藏此页] [打印本页]选择字色:背景色:字体:[][][]


求正则,开始没把问题说清楚,抱歉,请大家别介意 取html代码的连接
发表于:2007-08-07 20:51:13 楼主
imgurl=http://cqimg.focus.cn/upload/photos/40474/lpbgv5ip.jpg&imgrefurl=http://cq.focus.cn/msgview/40474/24554469.html&start=13&h=709&w=900&sz=112&tbnid=k0rnqvyz5qrfem:&tbnh=115&tbnw=146&hl=zh-cn&prev=/images%3fq%3d%25e5%25ae%25b6%25e8%25a3%2585%26gbv%3d2%26svnum%3d10%26hl%3dzh-cn%26newwindow%3d1%26ie%3dutf-8   target=_blank> <img   src=http://tbn0.google.com/images?q=tbn:k0rnqvyz5qrfem:   width=146   height=115> </a> </td> <td   align=center   valign=bottom   width=23%   style= "padding-top:1px; "> <a   href=/imgres?

取http://cqimg.focus.cn/upload/photos/40474/lpbgv5ip.jpg

<img   src=http://tbn0.google.com/images?q=tbn:7ljws6uejb0qym:   width=131   height=97> </a> </td> </tr> <tr> <td   valign=top   align=center   width=23%   style= "padding-bottom:1px; "> <font   face=arial,sans-serif   size=-1> 图片: <b> 家装 </b> 实景 <br> 900   x   709   -   112k&nbsp;-&nbsp;jpg <br> <font   color=#008000> cq.focus.cn </font> </font> </td> <td   valign=top   align=center   width=23%   style= "padding-bottom:1px; "> ]

取http://tbn0.google.com/images?q=tbn:7ljws6uejb0qym:

上面两个是我要取的   其他的连接大概都是这两种类型1
发表于:2007-08-07 20:56:001楼 得分:0
弱弱的说,楼主还是没据问题描述清楚

第一个

string   yourstr   =   ...............;
matchcollection   mc   =   regex.matches(yourstr,   @ "(? <==)[^ ' " "&\s> ]*(? <=\.jpg) ",   regexoptions.ignorecase);
foreach   (match   m   in   mc)
{
        richtextbox2.text   +=   m.value   +   "\n ";
}


第二个
string   yourstr   =   ...............;
matchcollection   mc   =   regex.matches(yourstr,   @ " <img\s*src=([ ' " "]?)(? <img> [^ ' " "\s> ]*)\1[^> ]*> ",   regexoptions.ignorecase);
foreach   (match   m   in   mc)
{
        richtextbox2.text   +=   m.groups[ "img "].value   +   "\n ";
}

目前根据你的例子先这么写吧,如果有不符合的,给出实例,或是具体说明根据什么才能取出你要的结果,是它的前面imgurl=是固定的,还是说它本身有一定的规律,扩展名一定是jpg吗,它以什么为结束标志,是这里的“&”字符,还是有可能是空格等等,这些不提供,就算是给出了,或许只符合这两个例子,并不通用的,楼主还是具体描述下需求和规律吧
发表于:2007-08-07 21:12:162楼 得分:0
或者我给楼主个例子吧

<a   href= "www.csdn.net "> csdn </a>
<a   href= 'www.csdn.net/blog '> csdn   blog </a>
<a   href= 'http://writeblog.csdn.net/ '   target= '_blank '> 我的blog </a>  
<div> <a   rel= "mz "   href=http://intel.csdn.net/> intel专区 </a> </div>
<div> <a   rel= "mz "   href= "http://www.csdn.net/ijs/ "> ijs专区 </a> </div>

以上内容,我想取出所有的网址,那么它符合的规律就是:
<a...> 标签内,href=后的内容,href=后可以为 ', "或者直接接网址,也就是要取的内容是在href=后,包含在 ' '或 " "之间,或者到第一个空白,或是到“> ”为止的内容
除了以上规律外,其它都可变,那么根据以上要求,我可以这样做

string   yourstr   =   ..................;
matchcollection   mc   =   regex.matches(yourstr,   @ " <a[^> ]*?href=([ ' " "]?)(? <url> [^ ' " "\s> ]*)\1[^> ]*> ",   regexoptions.ignorecase);
foreach   (match   m   in   mc)
{
        richtextbox2.text   +=   m.groups[ "url "].value   +   "\n ";
}

输出结果:
www.csdn.net
www.csdn.net/blog
http://writeblog.csdn.net/
http://intel.csdn.net/
http://www.csdn.net/ijs/


这样说,楼主是不是明白该给出什么样的规律了
发表于:2007-08-07 21:17:043楼 得分:0
顶一下
发表于:2007-08-07 21:20:584楼 得分:0
我试下嘛`哎,都怪我没好好学这个,郁闷得很```
其实就是下载google的图片里的图片,需要图片的链接地址,我把他的html下下来了`  
就差找他的图片地址`
发表于:2007-08-07 21:29:325楼 得分:0
我在google以“面包”为关键字搜索了一下,在源文件里只看到
http://www.0618.net/upfiles/2006-8/200682117838602.jpg
这种形式的地址,后面那一种,ms是拼接而成的,如果是源文件里同时含有这两种地址,楼主给个网址我看下吧
发表于:2007-08-07 21:34:286楼 得分:0
http://images.google.cn/images?hl=zh-cn&q=%e5%ae%b6%e8%a3%85&gbv=2
这个地址`
要先下下来,小图是一个地址,大图是一个地址
发表于:2007-08-07 21:42:497楼 得分:0
楼主是用哪种方式得到网页源文件的,如果是程序,贴下代码,我现在查看的源文件,图片都是这种形式的

dyn.img( "http://www.fotoer.com/sanweisheji/230035149.htm&h=372&w=500&sz=38&hl=zh-cn&start=19 ", "   target=_blank ", "d7linar8zvodim: ", "http://www.ab100.com/modata/i8/2005110077.jpg ", "130 ", "97 ", "室内效果图,\x3cb\x3e家装\x3c/b\x3e效果图,效果图设计欣赏 ", " ", " ", "500   x   372   -   38k ", "jpg ", "www.fotoer.com ", " ", " ", "http://tbn0.google.com/images ", "0 ");  

第一种
http://www.ab100.com/modata/i8/2005110077.jpg
可以直接取出,第二种
http://tbn0.google.com/images?q=tbn:d7linar8zvodim:
可以通过取出
http://tbn0.google.com/images

d7linar8zvodim:
拼接得到

如果楼主取得的源文件,第二种地址直接就存在,给出取源文件代码
发表于:2007-08-07 21:47:218楼 得分:0
用程序取的源文件
发表于:2007-08-07 21:54:449楼 得分:0
imgurl=http://www.a963.com/com/15989/photo/max/20051110151319.jpg&imgrefurl=http://www.a963.com/a963/works/worksdetail.php%3fid%3d2662&start=2&h=600&w=800&sz=80&tbnid=srti0-orst1z3m:&tbnh=107&tbnw=143&hl=zh-cn&prev=/images%3fq%3d%25e5%25ae%25b6%25e8%25a3%2585%26gbv%3d2%26svnum%3d10%26hl%3dzh-cn%26newwindow%3d1%26ie%3dutf-8   target=_blank> <img   src=http://tbn0.google.com/images?q=tbn:srti0-orst1z3m:   width=143   height=107> </a> </td> <td   align=center   valign=bottom   width=23%   style= "padding-top:1px; "> <a   href=/imgres?imgurl=http://www3.allfang.com/allfang/upload/fitment_images/image121139.jpg&imgrefurl=http://newsdetail.cd.allfang.com/2006-09/7373_1.html&start=3&h=342&w=400&sz=21&tbnid=pkllflyscxexjm:&tbnh=106&tbnw=124&hl=zh-cn&prev=/images%3fq%3d%25e5%25ae%25b6%25e8%25a3%2585%26gbv%3d2%26svnum%3d10%26hl%3dzh-cn%26newwindow%3d1%26ie%3dutf-8   target=_blank> <img   src=http://tbn0.google.com/images?q=tbn:pkllflyscxexjm:   width=124   height=106> </a> </td> <td   align=center   valign=bottom   width=23%   style= "padding-top:1px; "> <a   href=/imgres?imgurl=http://www.allfang.com/allfang/upload/news_images/image124307.jpg&imgrefurl=http://magazine.bj.allfang.com/allfangzk.html&start=4&h=285&w=400&sz=30&tbnid=9gve8gm8jbem3m:&tbnh=88&tbnw=124&hl=zh-cn&prev=/images%3fq%3d%25e5%25ae%25b6%25e8%25a3%2585%26gbv%3d2%26svnum%3d10%26hl%3dzh-cn%26newwindow%3d1%26ie%3dutf-8   target=_blank> <img   src=http://tbn0.google.com/images?q=tbn:9gve8gm8jbem3m:   width=124   height=88> </a> </td> </tr> <tr> <td   valign=top   align=center   width=23%   style= "padding-bottom:1px; "> <font   face=arial,sans-serif   size=-1> 科技 <b> 家装 </b> 渐渐成为业界发展的主流方向之   <b> ... </b> <br> 500   x   375   -   31k&nbsp;-&nbsp;jpg <br> <font   color=#008000> www.eju.cn </font> </font> </td> <td   valign=top   align=center   width=23%   style= "padding-bottom:1px; "> <font   face=arial,sans-serif   size=-1> 作品名称:某 <b> 家装 </b> <br> 800   x   600   -   80k&nbsp;-&nbsp;jpg <br> <font   color=#008000> www.a963.com </font> </font> </td> <td   valign=top   align=center   width=23%   style= "padding-bottom:1px; "> <font   face=arial,sans-serif   size=-1> 成都43 <b> 家装 </b> 公司签《透明 <b> 家装 </b> 公约》 <br> 400   x   342   -   21k&nbsp;-&nbsp;jpg <br> <font   color=#008000> newsdetail.cd.allfang.com </font> </font> </td> <td   valign=top   align=center   width=23%   style= "padding-bottom:1px; "> <font   face=arial,sans-serif   size=-1> 工薪族 <b> 家装 </b> 避免十大误区装出温馨个性   <b> ... </b> <br> 400   x   285   -   30k&nbsp;-&nbsp;jpg <br> <font   color=#008000> magazine.bj.allfang.com </font> </font> </td> </tr> </table>                                     <table   align=center   border=0   cellpadding=2   cellspacing=0   width=100%> <tr> <td   align=center   valign=bottom   width=23%   style= "padding-top:1px; "> <a   href=/imgres?imgurl=http://www.shide.com:1080/shiqi.jpg&imgrefurl=http://www.shide.com:1080/sparlee/index_sjzs.jsp&start=5&h=357&w=500&sz=59&tbnid=s5yovlsremngam:&tbnh=93&tbnw=130&hl=zh-cn&prev=/images%3fq%3d%25e5%25ae%25b6%25e8%25a3%2585%26gbv%3d2%26svnum%3d10%26hl%3dzh-cn%26newwindow%3d1%26ie%3dutf-8   target=_blank> <img   src=http://tbn0.google.com/images?q=tbn:s5yovlsremngam:   width=130   height=93> </a> </td> <td   align=center   valign=bottom   width=23%   style= "padding-top:1px; "> <a   href=/imgres?
发表于:2007-08-07 21:55:1510楼 得分:0
方便就贴取源文件的程序代码,不方便就用站内信发给我,用旧版消息

下面是我用直接查看源文件得到的源字符串取的,不过这样取出的结果是二十一对,而不是页面显示的二十对,最后一个页面上没有显示

string   yourstr   =   ................;
matchcollection   mc   =   regex.matches(yourstr,   @ "dyn.img\(( " "[^ " "]* " ",){2} " "(? <e> [^ " "]*) " ", " "(? <url> [^ " "]*) " "(, " "[^ " "]* " ")*, " "(? <p> [^ " "]*) " ", " "[^ " "]* " "\); ",   regexoptions.ignorecase);
foreach   (match   m   in   mc)
{
        richtextbox2.text   +=   m.groups[ "url "].value   +   "\n ";
        richtextbox2.text   +=   m.groups[ "p "].value   +   "?q=tbn: "   +   m.groups[ "e "].value   +   "\n ";
}

ps:我这白天断网,在晚上十二点以前可以解答你的问题
发表于:2007-08-07 21:55:3311楼 得分:0
http://tbn0.google.com/images?q=tbn:srti0-orst1z3m:   http://www.a963.com/com/15989/photo/max/20051110151319.jpg

取的就是这两种格式的
发表于:2007-08-07 21:57:0612楼 得分:0
在网页上看的不对   要下程序下来才是对的哦?
发表于:2007-08-07 21:58:1813楼 得分:0
一共是20对```
发表于:2007-08-07 21:59:0014楼 得分:0
又见过客~~
发表于:2007-08-07 22:04:2415楼 得分:0
看了一下你给的,单独的一条数据,应该是这种形式吧

<td   align=center   valign=bottom   width=23%   style= "padding-top:1px; "> <a   href=/imgres?imgurl=http://www3.allfang.com/allfang/upload/fitment_images/image121139.jpg&imgrefurl=http://newsdetail.cd.allfang.com/2006-09/7373_1.html&start=3&h=342&w=400&sz=21&tbnid=pkllflyscxexjm:&tbnh=106&tbnw=124&hl=zh-cn&prev=/images%3fq%3d%25e5%25ae%25b6%25e8%25a3%2585%26gbv%3d2%26svnum%3d10%26hl%3dzh-cn%26newwindow%3d1%26ie%3dutf-8   target=_blank> <img   src=http://tbn0.google.com/images?q=tbn:pkllflyscxexjm:   width=124   height=106> </a> </td>

用下面的试下,看结果对不对

string   yourstr   =   ..............;
matchcollection   mc   =   regex.matches(yourstr,   @ " <a\s*href=/imgres\?imgurl=(? <img> [^&]*)&[^> ]*> \s* <img\s*src=(? <url> \s*)[^> ]*> ",   regexoptions.ignorecase);
foreach   (match   m   in   mc)
{
        richtextbox2.text   +=   m.groups[ "img "].value   +   "\n ";
        richtextbox2.text   +=   m.groups[ "url "].value   +   "\n ";
}
发表于:2007-08-07 22:05:0316楼 得分:0
to:sassyboy(我要一座房子,面朝大海,春暖花开...)  

恭喜升星~
发表于:2007-08-07 22:15:5017楼 得分:0
结帐了,谢谢你```可以留下qq不,正则高手`````我的qq:7901952
发表于:2007-08-07 22:28:4618楼 得分:0
我这最近情况特殊,白天断网,只能晚上上网,qq号不便在这里透露,如果确实有需要,给我发站内信吧,我再告诉你


快速检索

最新资讯
热门点击