php - Query google search engine? -
i trying query google search engine date first page results process it. query using returns results not in date range set; if copied same query google works date not php script. script returns current or normal results if date parameter not set. part of code snippet used below. query referring below in code snippet posted in $url variable.
query:https://www.google.com/search?q='.$query.'&source=lnt&tbs=cdr%3a1%2'.$startdate.$enddate.'&tbm=
$query= $_post['query']; $query=str_replace(" ","+",$query); if ($_post['start_date']==''){ $startday='1'; $startmonth='11'; $startyear='2011'; } if ($_post['end_date']==''){ $endday='1'; $endmonth='11'; $endyear='2013'; } $startdate='ccd_min%3a'.$startmonth.'%2f'.$startday.'%2f'.$startyear.'.%2'; $enddate='ccd_max%3a'.$endmonth.'%2f'.$endday.'%2f'.$endyear.''; if ($_post['query']!=''){ $url = 'https://www.google.com/search? q='.$query.'&source=lnt&tbs=cdr%3a1%2'.$startdate.$enddate.'&tbm='; echo $url .'<p>'; $html = file_get_html($url); $searchresults=array(); $linkobjs = $html->find('h3.r a'); foreach ($linkobjs $linkobj) { $link = trim($linkobj->href); // if not direct link url reference found inside it, extract if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&sa=/u', $link, $matches) && preg_match('/^https?/', $matches[1])) { $link = $matches[1]; } else if (!preg_match('/^https?/', $link)) { // skip if not valid link continue; } array_push($searchresults,$link); }
google presents different html structure devices without javascript
enabled (file_get_html($url)
). temporarily disable javascript on chrome , inspect page. way you'll sure correct div id's
, classes
, etc use on script.
update based on comments:
google doesn't allow searching date range via direct url if javascript disabled. although, can still use daterange
google operator find pages indexed googlebot within date range specified. dates submitted must in julian date
format , fractions should omitted operator work properly.
example: daterange:2452671-2452671 lisbon
the daterange
operator requires @ least 1 proper search term , can combined other operators.
gregoriantojd()
to convert gregorian date
julian date
can use php function gregoriantojd( int $month , int $day , int $year )
, i.e.:
$startdate = gregoriantojd(12, 28, 2011); //2455924 $enddate = gregoriantojd(12, 28, 2014); //2457020
your search $url
should this:
$url = "https://www.google.pt/search?q=lisbon+daterange:2455924-2457020&btng=search&num=100&gbv=1"
final code:
include_once("simple_html_dom.php"); $startdate = gregoriantojd(12, 28, 2011); //2455924 $enddate = gregoriantojd(12, 28, 2014); //2457020 $nresults = "100"; $query= "lisbon"; $url = "https://www.google.com/search?q=$query+daterange:$startdate-$enddate&btng=search&num=$nresults&gbv=1"; echo $url .'<p>'; $html = file_get_html($url); $searchresults=array(); $linkobjs = $html->find('h3.r a'); foreach ($linkobjs $linkobj) { $link = trim($linkobj->href); // if not direct link url reference found inside it, extract if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&sa=/u', $link, $matches) && preg_match('/^https?/', $matches[1])) { $link = $matches[1]; } else if (!preg_match('/^https?/', $link)) { // skip if not valid link continue; } array_push($searchresults,$link); } print_r($searchresults); /* array ( [0] => http://www.cnn.com/2014/01/25/travel/lisbon-coolest-city/ [1] => http://www.tripadvisor.com/tourism-g189158-lisbon_lisbon_district_central_portugal-vacations.html etc... */
Comments
Post a Comment