Crawler
The crawler's dual purpose is to collect and maintain a list of source urls, and to regularly check these sources for new pages.
The list of urls is made of sites specified by clients, and by scraping the web for blogs, forums and news sites, using industry-specific keywords.
New urls are added every day, to keep it up-to-date with new sites. All these urls are then visited several times a day (in some cases, every hour), and new pages are retrieved and passed on to the Analysis Pipeline.




