Create Simple Web Crawler Using PHP And MySQL
Last Updated : Jul 1, 2023
In this tutorial we will show you how to create a simple web crawler using PHP and MySQL, web crawler is used to crawl webpages and collect details like webpage title, description, links etc for search engines and store all the details in database.
So that when someone search in search engine they get desired results web crawler is one of the most important part of a search engine. You may also like create search engine using PHP, Ajax and MySQL.
To Create Simple Web Crawler It Takes Only One Step:-
- Make a PHP file to crawl webpages and store details in database
Step 1. Make a PHP file to crawl webpages and store details in database
We make a PHP file and save it with a name crawl.php
// Database Structure CREATE TABLE 'webpage_details' ( 'link' text NOT NULL, 'title' text NOT NULL, 'description' text NOT NULL, 'internal_link' text NOT NULL, ) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 <?php $main_url="http://samplesite.com"; $str = file_get_contents($main_url); // Gets Webpage Title if(strlen($str)>0) { $str = trim(preg_replace('/\s+/', ' ', $str)); // supports line breaks inside <title> preg_match("/\<title\>(.*)\<\/title\>/i",$str,$title); // ignore case $title=$title[1]; } // Gets Webpage Description $b =$main_url; @$url = parse_url( $b ); @$tags = get_meta_tags($url['scheme'].'://'.$url['host'] ); $description=$tags['description']; // Gets Webpage Internal Links $doc = new DOMDocument; @$doc->loadHTML($str); $items = $doc->getElementsByTagName('a'); foreach($items as $value) { $attrs = $value->attributes; $sec_url[]=$attrs->getNamedItem('href')->nodeValue; } $all_links=implode(",",$sec_url); // Store Data In Database $host="localhost"; $username="root"; $password=""; $databasename="sample"; $connect=mysql_connect($host,$username,$password); $db=mysql_select_db($databasename); mysql_query("insert into webpage_details values('$main_url','$title','$description','$all_links')"); ?>
In this step we create a database called 'webpage_details' to store webpage details extracted by our crawler.In starting we enter webpage url and get its content using file_get_contents() function and them we use some regular expression to get webpage title.
To get webpage description we use parse_url() function and get_meta_tags() function to get description value.
Now we only have to get all the link present a webpage to do this task we create an object of dom and load html and then get all the anchor tag using getElementsByTagName() function and by using foreach loop we get the value of href attribute and that is our final url and after that we store all the details in our database table.
That's all, this is how to create simple web crawler using PHP and MySQL. You can customize this code further as per your requirement. And please feel free to give comments on this tutorial.
I hope this tutorial on web crawler php helps you and the steps and method mentioned above are easy to follow and implement.