-
Notifications
You must be signed in to change notification settings - Fork 10
/
chapter3.html
198 lines (138 loc) · 10.4 KB
/
chapter3.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
<!doctype html>
<html>
<head>
<title>Intermediate Ruby</title>
<meta charset="utf-8">
</head>
<body>
<div>
<h1>Day 3</h1>
<h2>Understanding HTTP concepts</h2>
<p>When you start building web applications, you would be dealing with HTTP a lot.</p>
<h3>What's HTTP?</h3>
<p>HTTP means HyperText Transfer Protocol. A protocol is just a convention for computer dialogs. It assumes that the computers participating in the conversation have a way of sending messages to each other.</p>
<p>HTTP uses a simple conversation pattern: the client connects to the server, initiates the dialog by asking the server for something, the server then tries to provide the client with an answer (back).</p>
<p>An example of a simple HTTP request and response would be the client asking "give me the file A" and the server answering "I have it, here is the content of file A: (...content of file A...)".</p>
<p>But in reality the protocol uses its own simplified language which looks more like "GET /fileA" answered by "200 OK (...content of file A...)".</p>
</div>
<div style="width:image 560 px; font-size:80%; text-align:center;"><img src="http://rubylearning.com/images/HTTP1.jpg" alt="HTTP" width="560" style="padding-bottom:0.5em;" /><br />HTTP</div>
<div>
<p>The basic principle of HTTP, and the Web in general, is that every resource (such as a web page) available on the Web has a distinct Uniform Resource Locator (URL), and that web clients can use HTTP verbs such as <code>GET</code>, <code>POST</code>, <code>PUT</code>, and <code>DELETE</code> to retrieve or otherwise manipulate those resources.</p>
<h4>Loading a web page</h4>
<p>When a browser loads a web page, it does a <code>GET</code> request for the URL requested. The content of the page is usually in a special text format called HTML that allows links, images and various style effects.</p>
<p>The browser analyses the HTML in order to display it to the user. But doing so, it may find that it needs more content to render the page correctly, like images.</p>
<p>When that happens, the browser initiates one HTTP request for each image.</p>
<p>When all the necessary information is downloaded, it is combined into the page the user sees on his screen.</p>
</div>
<div style="width:image 560 px; font-size:80%; text-align:center;"><img src="http://rubylearning.com/images/HTTP2.jpg" alt="HTTP" width="560" style="padding-bottom:0.5em;" /><br />HTTP</div>
<div>
<h3>HTTP request methods (verbs)</h3>
<h4>GET</h4>
<p>A resource is some chunk of information that can be identified by a URL (it's the R in URL - Uniform Resource Locator). The most common kind of resource is a file, but a resource may also be a dynamically-generated query result, a document that is available in several languages, or something else.</p>
<p><code>GET</code> is the most common HTTP method; it says "give me this resource". Method names are always uppercase.</p>
<p>A typical request looks like this:</p>
<pre>GET /path/to/file/index.html HTTP/1.1
</pre>
<p>The path is the part of the URL after the host name, also called the request URI (a URI is like a URL, but more general).</p>
<p>The HTTP version always takes the form "HTTP/x.x", uppercase.</p>
<h4>POST</h4>
<p>A <code>POST</code> request is used to send data to the server to be processed in some way. A <code>POST</code> request is different from a <code>GET</code> request in the following ways:</p>
<ul>
<li>There's a block of data sent with the request, in the message body.</li>
<li>The request URI is not a resource to retrieve; it's usually a program to handle the data you're sending.</li>
<li>The HTTP response is normally program output, not a static file.</li>
</ul>
<h4>PUT</h4>
<p>A <code>PUT</code> request uploads a document from either the local file system or a remote HTTP server to the destination HTTP server.</p>
<h4>DELETE</h4>
<p>A <code>DELETE</code> request deletes a resource (or makes it unavailable) for future references.</p>
<h3>HTTP response codes</h3>
<p>After a client (such as a web browser) sends a request corresponding to one of the HTTP verbs, the web server responds with a numerical code indicating the HTTP status of the response. For example, a status code of 200 means "success", and a status code of 301 means "permanent redirect". If you install <b>curl</b> (when you installed the Bash shell, curl was also installed), a command-line client that can issue HTTP requests, you can see this directly at, e.g., www.satishtalim.com (where the --head flag prevents curl from returning the whole page). Open a Bash shell and type:</p>
<pre>$ curl --head www.satishtalim.com
HTTP/1.1 200 OK
.
.
.
</pre>
<p>Here satishtalim.com indicates that the request was successful by returning the status 200 OK. In contrast, google.com is permanently redirected (to www.google.com, naturally), indicated by status code 301 (a "301 redirect"):</p>
<pre>$ curl --head rubylearning.org
HTTP/1.1 301 Moved Permanently
Location: http://www.google.com/
.
.
.
</pre>
<p>The Internet has become an inescapable part of software development, and Ruby has a significant number of libraries available to deal with the plethora of Internet services available.</p>
<h3>net/http library</h3>
<p>The <a href="http://www.ruby-doc.org/stdlib-1.9.3/libdoc/net/http/rdoc/index.html">net/http</a> library comes standard with Ruby and provides a simple client to fetch headers and web page contents using the HTTP and HTTPS protocols. The <code>Net::HTTP</code> is designed to work closely with <code>URI</code>. <code>URI::HTTP#host</code>, <code>URI::HTTP#port</code> and <code>URI::HTTP#request_uri</code> are designed to work with <code>Net::HTTP</code>. The <code>URI</code> class is automatically loaded by <code>net/http</code>.</p>
<h4>Using URI</h4>
<p>Let's write a program <code>neturi1.rb</code> and store it in the folder <code>my_ruby_programs</code>:</p>
<pre># neturi1.rb
require 'net/http'
uri = URI('http://rubylearning.com/data/text.txt')
res = Net::HTTP.get_response(uri)
puts res.code # => '200'
puts res.message # => 'OK'
puts Net::HTTP.get(uri) # => String
</pre>
<p>This example loads the <code>net/http</code> library and connects to the URL http://rubylearning.com/data/text.txt</p>
<p>The <code>Net::HTTP.get_response(uri)</code> sends a <code>GET</code> request to the target and returns the HTTP response as a <code>Net::HTTPResponse</code> object. The target is specified as (uri). We can use the <code>code</code> and <code>message</code> methods to return the response data.</p>
<p>The <code>Net::HTTP.get(uri)</code> performs an HTTP <code>GET</code> request for <code>text.txt</code>. This file's contents are then returned and displayed. If you load the URL http://rubylearning.com/data/text.txt in your web browser, you'll get the same response as Ruby.</p>
<p>The <a href="http://www.ruby-doc.org/stdlib-1.9.3/libdoc/net/http/rdoc/Net/HTTP.html#method-i-start">Net::HTTP.start</a> method opens a TCP connection and HTTP session. When this method is called with a block, it passes the <code>Net::HTTP</code> object to the block, and closes the TCP connection and HTTP session after the block has been executed.</p>
<p>Let's write a program that uses the <code>start</code>. Let's call it <code>nethttp1.rb</code> and store it in the folder <code>my_ruby_programs</code>:</p>
<pre># nethttp1.rb
require 'net/http'
url = URI.parse('http://rubylearning.com/data/text.txt')
Net::HTTP.start(url.host, url.port) do |http|
req = Net::HTTP::Get.new(url.path)
puts http.request(req).body
end
</pre>
<p>The <code>URI</code> class is used to parse the supplied URL. An object is returned whose methods <code>host</code>, <code>port</code>, and <code>path</code> supply different parts of the URL for <code>Net::HTTP</code> to use. Note that in this example you provide two parameters to the main <code>Net::HTTP.start</code> method: the URL's hostname and the URL's port number. The port number is optional, but <code>URI.parse</code> is clever enough to return the default HTTP port number of 80.</p>
<h3>Using open-uri</h3>
<p><code>open-uri</code> is a standard Ruby library that wraps up the functionality of <code>net/http</code>, <code>net/https</code>, and <code>net/ftp</code> into a single package. With <code>open-uri</code> retrieving a document from the Web becomes much like opening a text file on the local machine.</p>
<pre># netouri.rb
require 'open-uri'
f = open('http://rubylearning.com/data/text.txt')
puts f.readlines.join
</pre>
<p>As with <code>File::open</code>, <code>open</code> returns an I/O object (technically a <code>StringIO</code> object), and you can use a method such as <code>readlines</code>.</p>
<h3>Using Hpricot</h3>
<p>The <b>Hpricot</b> library is not part of Ruby but is very popular. Hpricot is a very rich Ruby library designed to make HTML parsing fast, easy, and fun. We'll only scratch the surface here.</p>
<p>To install Hpricot, open a new command window and type:</p>
<pre>gem install hpricot
</pre>
<p>Hpricot can work directly with <code>open-uri</code> to load HTML from remote files.</p>
<pre># nethpcot.rb
require 'open-uri'
require 'hpricot'
page = Hpricot(open('http://rubylearning.com'))
puts "Page title is: " + page.at(:title).inner_html
</pre>
<h3>Using Nokogiri</h3>
<p>Another library worth looking into it is <a href="http://nokogiri.org/">Nokogiri</a>. Nokogiri is an <a href="http://www.wikipedia.org/wiki/HTML">HTML</a>, <a href="http://www.wikipedia.org/wiki/XML">XML</a>, <a href="http://www.wikipedia.org/wiki/XML#Simple_API_for_XML_.28SAX.29">SAX</a> parser/reader with the ability to search documents via <a href="http://www.wikipedia.org/wiki/XPath">XPath</a> or <a href="http://www.wikipedia.org/wiki/CSS3#CSS_3">CSS3</a> selectors.</p>
<p>You can install Nokogiri with following command:</p>
<pre>gem install nokogiri</pre>
<h4>Fetching documents from web</h4>
<p>You can combine <code>nokogiri</code> with <code>open-uri</code> to fetch document like so:</p>
<pre># nokogiri_demo.rb
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://rubylearning.com/"))
</pre>
<h4>Searching inside HTML documents</h4>
<p>Nokogiri enables you to seach documents with XPath or CSS3 selectos like so:<p>
<pre># nokogiri_demo2.rb
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://rubylearning.com/"))
# Search with XPath
doc.xpath("//h2[@id='slogan']").first.content
# output: "Helping Ruby Programmers become Awesome!"
# Search with CSS3
doc.css("#footer p strong:first-child")[0].content
# output: "RubyLearning.com - A Ruby Tutorial"
</pre>
</div>
</body>
</html>