Skip to content
/ hq Public

HTML parser cli using Xpath selector with xmltodict, it's a better piping action before jq stdin

License

Notifications You must be signed in to change notification settings

ICoder0/hq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hq

HTML parser using Xpath expression then dump as json, it's better piping action before jq stdin.

Installation

python3 setup install

Best practice

  • step1: fetch by cURL
  • step2: extract vars from html/xml via xpath and dumps as json
  • step3: permutate-combine json via jq
curl http://testapi.cn | 
hq -x 'title=//xxxx' -x 'link=//xxxxx' |
jq '[.title,.link] | transpose | map({title:.[0],link:.[1]})' > test.json

Usage

without special key

input hq -x '/html/body/text()' '<html><body>123</body></html>'

output {"_0": "123"}

with special key

input hq -x 'test=/html/body/text()' '<html><body>123</body></html>'

output {"test":"123"}

with special file

input hq -x 'test=/html/body/text()' -f asd.html

output {"test":"123"}

with piping stdin

input cat asd.html | hq -x 'test=/html/body/text()'

output {"test":"123"}

About

HTML parser cli using Xpath selector with xmltodict, it's a better piping action before jq stdin

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages