Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with nested loop #18

Open
phoebebright opened this issue Jun 18, 2012 · 0 comments
Open

Problem with nested loop #18

phoebebright opened this issue Jun 18, 2012 · 0 comments

Comments

@phoebebright
Copy link

This could be a user error but have tried every permutation I can think of without success.
I'm using the versin of scrapemark.py updated on Aug 11, 2011.

Here is an example. If I pull the nested part out and manually split by
then scrapemark will process each line correctly, but the nested version only finds the first match.

from scrapemark import scrape

src = '''
\n\n\n\n\n\n\t\n\n\t\t\n\n<style type="text/css">\n\n\n\n</style>\n\n\n\n<script type="text/javascript" src="http://webhost.bridgew.edu/etribou/layouts/javascript/ruthsarian_utilities.js"></script>\n\n<script type="text/javascript">\n\n\n\n</script>\n\n\n\n\n\n\n\n\n\n<title>Daily Naps</title>\n\n\n\n\n\n\t\t\n\n

\t\n\n\n\n\t
\t\n\n\t\t\n\ufeff\t\tDailyNaps logo\n\ufeff\t\t
\n\t\t\t\t\n\t\t
\n\t
\n\n\t
\n\t\t\n\ufeff\t\tHome\n\n\t\tHorse Racing Results\n\n\t\tPrevious Results\n\n\t\tPrevious Racecards\n\t\t\n\t\tDailynaps Software\n\n\t\tCheck Horse Odds\n\n\t\tBetting Links\n\n\t\tContact\n\t\t\n\t\tAffiliates\t\t\n\t\t\n\n\t\t
\n\n\t
\n\n\n\n\t
\n\n\t\t
\n\n\t\t\t
\n\n\t\t\t\t
\n\n\t\t\t\t\t
\n\n\t\t\t\t\t\t\n\n\n\n

UK Horse Racing Results

\n\n


Sunday, 10 June 2012


\n

    <div style=\'font-weight:bold\'>Curragh</div>\n
    <b>2:20 : </b>2 Gale Force Ten (J P O\'Brien, 7-2 ); 5 Leitir Mor (K J Manning, 11-10 fav); 4 Hard Yards (C D Hayes, 16-1 ); 8 ran. 6 Newberry Hill (F M Berry, 11-4 2nd-fav); <br>
    <b>2:50 : </b>10 Alsium (C D Hayes, 7-1 ); 3 Cape Of Approval (W Lordan, 2-1 fav); 4 Flying Doha (W J Lee, 7-2 2nd-fav); 15 ran.<br>
    <b>3:20 : </b>7 Kateeva (L F Roche, 14-1 ); 5 Battleroftheboyne (B A Curtis, 12-1 ); 10 Erins Gal (R P Cleary, 20-1 ); 12 ran. 2 Lake George (R P Whelan, 5-1 joint-fav);  3 Allegra Tak (P J Smullen, 5-1 joint-fav); <br>
    <b>3:50 : </b>4 Sharestan (N G McCullagh, 8-11 fav); 2 Defining Year (S Foley, 8-1 ); 7 ran. 7 Learn (C O\'Donoghue, 3-1 2nd-fav); <br>

    <br></p>\n<br><br></font></div>\n\n\n\n\n<!--- middle (main content) column end -->\n\t\t\t\t\t\t<hr class="hide">\n\t\t\t\t\t</div>\n\t\t\t\t</div>\n\t\t\t\t<div id="leftColumn">\n\t\t\t\t\t<div class="inside">\n\t\t\t\t\t\t<!--- left column begin -->\n\t\t\t\t\t\t<div class="vnav">\n<center>\n<iframe allowtransparency="true" src="http://media.paddypower.com/ad.aspx?bid=3079&pid=10060697" \nwidth="120" height="600" marginwidth="0" marginheight="0" hspace="0" vspace="0" \nframeborder="0" scrolling="no"></iframe>\n</center>\n\n\n\t\t\t\t\t\t\t<br />\n\t\t\t\t\t\t\t<br />\n\ufeff<center>\t\t\t\t\t\t\n\n<p></p>\n<p></p>\n\n<a href="http://media.paddypower.com/redirect.aspx?pid=10060697&bid=4403">\n<img src="http://media.paddypower.com/renderimage.aspx?pid=10060697&bid=4403" border=0></img ></a>\n\n<p></p>\n<p></p>\n\n</center>\n<br />\n<br />\n\t\t\t\t\t\t</div>\n\t\t\t\t\t\t<!--- left column end -->\n\t\t\t\t\t\t<hr class="hide">\n\t\t\t\t\t</div>\n\t\t\t\t</div>\n\t\t\t\t<div class="clear"></div>\n\t\t\t</div>\n\t\t\t<div id="rightColumn">\n\t\t\t\t<div class="inside">\n\t\t\t\t\t<!--- right column begin -->\n<p></p>\n<center>\n<iframe src="http://serve.williamhill.com/promoLoadDisplay?member=jpowell79&campaign=DEFAULT&channel=DEFAULT&zone=1471696800&lp=0" style="height:600px;width:120px;" frameborder="0" scrolling="no" MARGINWIDTH="0" MARGINHEIGHT="0" ></iframe>\n</center>\n<p></p>   \n\t\t\t\t   <br />\n\t\t\t\t   <br />\n\n<center>\t\t\t\t\t\t\n\n<p></p>\n<p></p>\n\n<a href="http://media.paddypower.com/redirect.aspx?pid=10060697&bid=3519">\n<img src="http://media.paddypower.com/renderimage.aspx?pid=10060697&bid=3519" border=0></img ></a>\n\n<p></p>\n<p></p>\n\n</center><br />\n<br />\n\t\t\t\t\t<!--- right column end -->\n\t\t\t\t\t<hr class="hide">\n\t\t\t\t</div>\t\t\t\t\n\t\t\t</div>\n\t\t\t<div class="clear"></div>\n\t\t</div>\n\t</div>\t\t\t\n\t<div id="footer" class="inside">\n\t\t<!-- footer begin -->\n\ufeff\t\t<a href="index.php">Home</a>\n\n\t\t<a href="results.php">Horse Racing Results</a>\n\n\t\t<a href="previous-results.php">Previous Results</a>\n\n\t\t<a href="previous-racecards.php">Previous Racecards</a>\n\t\t\n\t\t<a href="strategy.php">Dailynaps Software</a>\n\n\t\t<a href="free_bets.php">Check Horse Odds</a>\n\n\t\t<a href="links.php">Betting Links</a>\n\n\t\t<a href="contact.php">Contact</a>\n\t\t\n\t\t<a href="affiliate.php">Affiliates</a>\t\t<!-- footer end -->\n\t\t<hr class="hide">\n\t</div>\n</div>\n</body>\n</html>\n\n
    '''

THIS ONLY RETURNS THE FIRST MATCH

results = scrape("""

UK Horse Racing Results



{{date}}
{*
{{ [course] }}

{* {{h}}:{{m}} : {{first}}; {{second}}; {{third}}; {{n}} ran
*}
*}

""",
html = src)

print results

THIS WORKS

results = scrape("""

UK Horse Racing Results



{{date}}
{*
{{ [course] }}

{{ results|html }}
*}

""",
html = src)

src = results["results"].replace("\n", "")

x = src.split("
")
for item in x:

r = scrape("""
            <b>{{h}}:{{m}} :</b> {{first}}; {{second}}; {{third}}; {{n}} ran
            """,
    item)
print r

print results

--------- RESULTS -----

{'third': u'4 Hard Yards (C D Hayes, 16-1 )', 'h': u'2', 'm': u'20', 'n': u'8', 'course': [u'Curragh'], 'second': u'5 Leitir Mor (K J Manning, 11-10 fav)', 'date': u'Sunday, 10 June 2012', 'first': u"2 Gale Force Ten (J P O'Brien, 7-2 )"}
{'third': u'4 Hard Yards (C D Hayes, 16-1 )', 'h': u'2', 'm': u'20', 'n': u'8', 'second': u'5 Leitir Mor (K J Manning, 11-10 fav)', 'first': u"2 Gale Force Ten (J P O'Brien, 7-2 )"}
{'third': u'4 Flying Doha (W J Lee, 7-2 2nd-fav)', 'h': u'2', 'm': u'50', 'n': u'15', 'second': u'3 Cape Of Approval (W Lordan, 2-1 fav)', 'first': u'10 Alsium (C D Hayes, 7-1 )'}
{'third': u'10 Erins Gal (R P Cleary, 20-1 )', 'h': u'3', 'm': u'20', 'n': u'12', 'second': u'5 Battleroftheboyne (B A Curtis, 12-1 )', 'first': u'7 Kateeva (L F Roche, 14-1 )'}
None
None
None
None
None
{'date': u'Sunday, 10 June 2012', 'course': [u'Curragh'], 'results': "\n\n 2:20 : 2 Gale Force Ten (J P O'Brien, 7-2 ); 5 Leitir Mor (K J Manning, 11-10 fav); 4 Hard Yards (C D Hayes, 16-1 ); 8 ran. 6 Newberry Hill (F M Berry, 11-4 2nd-fav);
\n 2:50 : 10 Alsium (C D Hayes, 7-1 ); 3 Cape Of Approval (W Lordan, 2-1 fav); 4 Flying Doha (W J Lee, 7-2 2nd-fav); 15 ran.
\n 3:20 : 7 Kateeva (L F Roche, 14-1 ); 5 Battleroftheboyne (B A Curtis, 12-1 ); 10 Erins Gal (R P Cleary, 20-1 ); 12 ran. 2 Lake George (R P Whelan, 5-1 joint-fav); 3 Allegra Tak (P J Smullen, 5-1 joint-fav);
\n 3:50 : 4 Sharestan (N G McCullagh, 8-11 fav); 2 Defining Year (S Foley, 8-1 ); 7 ran. 7 Learn (C O'Donoghue, 3-1 2nd-fav);
\n\n

\n

"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant