专利 - 详情

专利名称	Systems and methods of directionally guided, discriminate crawling of internet real estate listings
申请号	AU2007100279	申请日
公开（公告）号	AU2007100279A4	公开（公告）日
申请（专利权）人	BREEZ BRANDER	发明人	Brander Breez
专利来源	国家知识产权局	转化方式
摘要	Web crawler with plurality of crawlers (modules). Each module thread determines the initial URLs, from which information is to be downloaded and the crawl is to be directed further, by retrieving the pre-set entry URLs and their corresponding web pages with real estate content on them, downloading the document corresponding to the entry URLs, possessing the document and web pages. If entry URL contains real estate listings, the crawler thread filters the web page by pre-set filters and extracts the relevant words and hyperlinks, processes and categorises the filtered data and inputs it into a database, filters the same page for hyperlinks to similar real estate listings pages on the same website and processes those web pages the same as the initial one. Hyperlink filters utilise unique dynamic dictionary and thesaurus procedures. If the crawler thread does not find any pre-set typical content on the initial crawled web page it utilises the same dictionary and thesaurus procedures to filter the web page to determine if the web page has merely changed in source code or really does not contain any real estate listings data. If with this procedure it determines that it indeed does contain real estate listings data and that the web page source code has changed it updates the filters to remember the new code and processes the web page and filters out the real estate listings words and hyperlinks (data), processes and categorises the data and inputs it into the database, filters the same page for hyperlinks to similar real estate listings pages on the same website and processes those web pages the same as the initial one. If the crawler thread does not find any real estate listings data on the initial page after employing both procedures explained above it just filters the page for any possible hyperlinks that by the words in them would indicate they lead to a page on this website where real estate listings might be found. The crawler thread processes the entire initial website and then follows all outbound hyperlinks (hyperlinks to other websites on different domain names and/or Internet Protocol addresses) that it found on this website and flagged (remembered). The exact same crawl procedure explained above then repeats on this next website and on the next one until the crawler finds no more matching websites to crawl. 1人已参与 0数据匹配