PHPverse 2025

Voting

The Note You're Voting On

ben_demott at hotmail dot com
15 years ago
A function among several others to parse a google results page, I wrote this some time ago - google has probably changed their site since then, but I thought this might be helpful to someone.

I'm moving servers, but I will probably throw this up on my blog when I get it back up.

<?php

function googleResult($listItem) {
// given a LIST ITEM element, this will validate, and return an array for that LI entry as an inline result from google.
/*
* <li class='g w0'>
* <h3 class='r'>
* <a href='the URL' class='l'>
* Description <em>description</em>
* </a>
* </h3>
* </li>
*
UPDATE:
This function will now look for any subcontainer that has an href, it doesn't have to be an H3
this will make it work with a few more formatted search results.
*/

$listItem = $listItem->childNodes;
// Yes I don't use instanceof - I guess you'll have to deal.
foreach($listItem as $element) {
if(
is_object($element) && get_class($element) == 'DOMElement' && $element->hasChildNodes()) {
$hrefContainer = $element->childNodes;
foreach(
$hrefContainer as $element2) {
if(
is_object($element2) && get_class($element2) == 'DOMElement' && $element2->nodeName == 'a' && $element2->hasAttribute('href')) {
$anchor = $element2;
unset(
$h3);
unset(
$element2);
break;
} else {
//print __LINE__ ." :: Breaking out of loop (normal result) element is not an annchor Element='".$element2->nodeName."'\n";
}
}
unset(
$element);
unset(
$listItem);
break;
}
}
if(empty(
$anchor) || !is_object($anchor) || get_class($anchor) != 'DOMElement') {
//print __LINE__ ." :: Returning false, did not locate anchor through iteration...";
return false;
}
$href = $anchor->getAttribute('href');
if(empty(
$href)) {
//print __LINE__ ." :: Found anchor object, could not read href attribute / href is empty? href='$href'\n";
return false;
}
$description = $anchor->childNodes;
$urlDescription = '';
foreach(
$description as $words) {
$name = trim($words->nodeName);
if(
$name == 'em' || $name == '#text' || $name == 'b') {
if(!empty(
$words->nodeValue)) {
$text = trim($words->nodeValue);
$urlDescription = $urlDescription . $text . ' ';
}
}
}
$urlDescription = htmlspecialchars_decode($urlDescription, ENT_QUOTES);
$urlDescription = trim($urlDescription);
return array(
'description' => $urlDescription, 'href' => $href);
}

<< Back to user notes page

To Top