Dd7cfabd926316213d8cd5514cdf37ab

Abstract

Writing Web Bots with Python

Presentation Content Available

  • https://github.com/txjoe/Null-Bots-Talk
  • Remember to read the README file.

Introduction

  • Introduction to internet bots, What they are and what they can do?
  • Applications of internet bots from a security point of view.
  • Writing your first bot

Bypassing restrictions

  • How websites prevent web scraping.
  • How Robots.txt works.
  • User-Agent checks and how to beat them.
  • Captchas

Requirements

  • Attendees should have a laptop with either a Linux distro installed or running on a Virtual Machine.
  • They must know the basics of python (We will be using Python2.7.x for the session). Download Python 2.7 here: https://www.python.org/downloads/release/python-2713/
  • Attendees must have the packages "Mechanize" and "Beautiful Soup 4" installed. Run the following comments to install:

pip install mechanize
pip install beautifulsoup4

or

sudo apt-get install python-bs4
sudo apt-get install python-mechanize

  • Basic understanding of current web technologies (HTML , CSS , JS)

Legal Aspects

  • Scenarios where web scraping without prior permission can land people in trouble.
  • Case Study - The QVC web scraping ruling.
  • Case Study - The Ebay Bidders Edge case.
  • Case Study - U.S. V Auernheimer.
  • Conclusion.

Speaker

Tabish Imran

Timing

Starts at Saturday February 25 2017, 09:30 AM. The sessions runs for about 4 hours.

Resources