Skip to content
rohitstatic edited this page Mar 11, 2011 · 5 revisions

Welcome to the alipi-crawler wiki!

This project is a crawler and is a part of the alipi project. This crawler parses HTML pages from a fixed set of URLs, and checks for a specific attribute in the pages (a 'foruri' attribute), which marks this content as a renarration for the specific content. The crawler produces a JSON, which is more of a dictionary of dictionaries, containing various key:value pairs which helps in determining the replacements. This is then used by a program to renarrate.

An example can be found in the page http://www.a11y.in/.

<p id="hi" foruri="a11y_firesafety.html:div1" rec="lang:hi">
<span>
आग विभाग या आगशमक दल एक सार्वजनिक् या निजी समस्था है जो आग से होने वाली दुर्घटनाऒ से सुरक्षा प्रधान करती है, जो आम तौर पे एक नगर - पालिका या जिल्ला का निरिक्षण् करती है | एक विभाग के सिमा में आम तौर पे एक् से अधिक् आग शमक् स्टेशन् होते हैं | इन् सटेशनो मे व्याव्सयिक आग शमक या स्वयंसेवक कार्य करते है | </span>
</p>

The above code is a part of this page. The 'foruri' attribute, gives the URL and the id (of the HTML element at that page) for which, the renarration is this HTML element. There is also an attribute called 'rec', which provides the recommendation. In the above case it is 'lang:hi', which means the recommendation is of language Hindi.

Note : The 'foruri' and the 'rec' attributes are not a standard attribute of any HTML tag. They are user-defined ones.

Clone this wiki locally