Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
579 views
in Technique[技术] by (71.8m points)

web scraping - how to bypass reCAPTCHA with requests python / how to successfully parse ae.domain.com websites

CODE:

HEADERS ={
    'authority': 'ae.pricena.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'dnt': '1',
    'upgrade-insecure-requests': '1',
    'accept': '*/*',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
    'sec-fetch-site': 'none',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-dest': 'document',
    'referer' : 'https://www.google.com/'
}
proxies = {'http' : 'http://2.50.18.118','http' : 'http://217.164.255.35'}
r = requests.get('https://ae.pricena.com/en/mobile-tablets/mobile-phones',headers=HEADERS,proxies=proxies)

soup = BeautifulSoup(r.content,'lxml')
print(soup)

Output :

<style> body { margin: 20px auto; padding:20px; width: 1050px; font-family: "Helvetica Neue",Helvetica,Arial,sans-serif; font-size: 20px; line-height: 3; color: #333; background-color: #fff; } .logo { width: 100%; display: block; position: relative; clear: both; } .blue { display: block; position: relative; font-size: 24px; color:#08c; width: 100%; margin-top: 50px; clear: both; } #img-logo { display: block; background-repeat: no-repeat; background-position: left 0; background-color: transparent; background-image: url(//ae.pricenacdn.com/images/default/logo_en.png?v=5); background-image: linear-gradient(transparent,transparent),url(//ae.pricenacdn.com/images/default/pricena_logo_en.svg?v=1); text-indent: -99999px; width: 300px; height: 80px; text-indent: -99999px; float: left; margin: 10px 10px 10px -5px; } button { text-align: center; font-size: 14px; padding: 10px 30px; border: 0; border-radius: 2px; -webkit-border-radius: 2px; -moz-border-radius: 2px; cursor: pointer; color: #fff; background-color: #71bf44; border: 1px solid #6ebd44; } @media only screen and (min-device-width : 320px) and (max-device-width : 756px) { body { margin: 20px 3%; padding:20px 3%; width: 88%; font-size: 16px; line-height: 2; } .blue { font-size: 18px; } }</style><div id="wrapper"> <form action="https://ae.pricena.com/en/captcha/submit/" id="captcha_form" method="post"> <div class="captcha white-page col-12"> <div class="logo"><a href="http://pricena.com" id="img-logo">Pricena</a></div> <div class="blue">Hmmmmmm.... </div> <div class="text">Looks like you really like Pricena! To continue browsing we need to make sure you are human :) <br/> Just check the box below and you're good to go. </div> <div class="text">If this doesn't work, please get in touch with us on info[at]pricena.com.</div> <br/> <div class="g-recaptcha" data-callback="captchaAction" data-sitekey="6LefpIoUAAAAAKVNoyiwWXuwwvBp2iACEHuaTC8s"></div> <input name="YII_CSRF_TOKEN" type="hidden" value="5cd25f64ab3aaa8c18be960886a38d8d6714159b"/> <input name="requestedLink" type="hidden" value="https://ae.pricena.com/en/mobile-tablets/mobile-phones"/> </div> </form></div><script src="https://www.google.com/recaptcha/api.js?hl=en" type="text/javascript"></script><script type="text/javascript"> var captchaAction = function() { document.getElementById('captcha_form').submit(); };</script>

It is asking for ReCaptcha ... Is there any way to do not face Recaptcha with a specific proxy? or how to bypass Recaptcha with requests and BeautifulSoup I didn't want to involve Selenium


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...