Metadata-Version: 2.1
Name: crawler
Version: 0.0.2
Summary: Web Scraping Framework based on py3 asyncio
Author: Gregory Petukhov
Author-email: lorien@lorien.name
License: MIT
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Internet :: WWW/HTTP

=======
Crawler
=======

.. image:: https://travis-ci.org/lorien/crawler.png?branch=master
    :target: https://travis-ci.org/lorien/crawler

.. image:: https://coveralls.io/repos/lorien/crawler/badge.svg?branch=master
    :target: https://coveralls.io/r/lorien/crawler?branch=master

.. image:: https://pypip.in/download/crawler/badge.svg?period=month
    :target: https://pypi.python.org/pypi/crawler

.. image:: https://pypip.in/version/crawler/badge.svg
    :target: https://pypi.python.org/pypi/crawler

.. image:: https://landscape.io/github/lorien/crawler/master/landscape.png
   :target: https://landscape.io/github/lorien/crawler/master

Web scraping framework based on py3 asyncio & aiohttp libraries.


Usage Example
=============

.. code:: python

    import re
    from itertools import islice

    from crawler import Crawler, Request

    RE_TITLE = re.compile(r'<title>([^<]+)</title>', re.S | re.I)

    class TestCrawler(Crawler):
        def task_generator(self):
            for host in islice(open('var/domains.txt'), 100):
                host = host.strip()
                if host:
                    yield Request('http://%s/' % host, tag='page')

        def handler_page(self, req, res):
            print('Result of request to {}'.format(req.url))
            try:
                title = RE_TITLE.search(res.body).group(1)
            except AttributeError:
                title = 'N/A'
            print('Title: {}'.format(title))

    bot = TestCrawler(concurrency=10)
    bot.run()


Installation
============

.. code:: bash

    pip install crawler


Dependencies
============

* Python>=3.4
* aiohttp
