forked from brighton36/libcraigscrape
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README
89 lines (59 loc) · 2.74 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
= libcraigscrape - A craigslist URL-scraping support Library
An easy library to do the heavy lifting between you and Craigslist's posting database. Given a URL, libcraigscrape will
follow links, scrape fields, and make ruby-sense out of the raw html from craigslist's servers.
For more information, head to the {craiglist monitoring}[http://www.derosetechnologies.com/community/libcraigscrape] help section of our website.
== craigwatch
libcraigscrape was primarily developed to support the included craigwatch[link:files/bin/craigwatch.html] script. See the included craigwatch script for
examples of libcraigscape in action, and (hopefully) to serve an immediate craigscraping need.
== Installation
Install via RubyGems:
sudo gem install libcraigscrape
== Usage
=== Scrape Craigslist Listings since Sep 10
On the 'miami.craigslist.org' site, using the query "search/sss?query=apple"
require 'rubygems'
require 'libcraigscrape'
require 'date'
require 'pp'
miami_cl = CraigScrape.new 'us/fl/miami'
miami_cl.posts_since(Time.parse('Sep 10'), 'search/sss?query=apple').each do |post|
pp post
end
=== Scrape Last 225 Craigslist Listings
On the 'miami.craigslist.org' under the 'apa' category
require 'rubygems'
require 'libcraigscrape'
require 'pp'
i=1
CraigScrape.new('us/fl/miami').each_post('apa') do |post|
break if i > 225
i+=1
pp post
end
=== Multiple site with multiple section/search enumeration of posts
In Florida, with the exception of 'miami.craigslist.org' & 'keys.craigslist.org' sites, output each post in
the 'crg' category and for the search 'artist needed'
require 'rubygems'
require 'libcraigscrape'
require 'pp'
non_sfl_sites = CraigScrape.new('us/fl', '- us/fl/miami', '- us/fl/keys')
non_sfl_sites.each_post('crg', 'search/sss?query=artist+needed') do |post|
pp post
end
=== Scrape Single Craigslist Posting
This grabs the full details under the specific post http://miami.craigslist.org/mdc/sys/1140808860.html
require 'rubygems'
require 'libcraigscrape'
post = CraigScrape::Posting.new 'http://miami.craigslist.org/mdc/sys/1140808860.html'
puts "(%s) %s:\n %s" % [ post.post_time.strftime('%b %d'), post.title, post.contents_as_plain ]
=== Scrape Single Craigslist Listing
This grabs the post summaries of the single listings at http://miami.craigslist.org/search/sss?query=laptop
require 'rubygems'
require 'libcraigscrape'
listing = CraigScrape::Listings.new 'http://miami.craigslist.org/search/sss?query=laptop'
puts 'Found %d posts for the search "laptop" on this page' % listing.posts.length
== Author
- Chris DeRose ([email protected])
- DeRose Technologies, Inc. http://www.derosetechnologies.com
== License
See COPYING[link:files/COPYING.html]