Sunday, February 4, 2007

Recursive directory traversal

Today I found myself in need of traversing a directory structure with millions of files and match them against an existing database, in order to free up some storage.

At least one good thing came right out of it;
A nice clean recursive directory traversal function for whenever you need to process a directory tree. It uses hooks so you can implement whatever action you need for each file and directory.

Simple to use:
process_dir("/path/to/dir", "filehook", "dirhook", 2);
Where "filehook" and "dirhook", if set, are arguments to call_user_func (so you can call class methods) and "2" is the max level of directories to descend into.

File- and directory hook function examples:

<?
function dirhook($path$dir)
{
  print 
"dirhook: " $path DIRECTORY_SEPARATOR $dir "\n";
  return 
true;
}

function 
filehook($path$file)
{
  print 
"filehook. " $path DIRECTORY_SEPARATOR $file "\n";
  return 
true;
}
?>



If either dirhook or filehook returns false, processing of the current directory is aborted.

SO, here's the code then. Send me a note if you use it or have suggestions for improvements, ok?



<?
/**
 * Recursive directory traversal function.
 *
 * Author: orIgo (mrorigo@gmail.com)
 * Use, modify and share, but leave my name in here, ok?
 *
 * @param $path      Path of start directory
 * @param $filehook  File callback function
 * @param $dirhook   Directory callback function
 * @param $maxdepth  Max levels of directories to descend into
 */
function process_dir($path,
             
$filehook=null,
             
$dirhook=null,
             
$maxdepth=null,
             
$depth=0)
{
  if(
$maxdepth && 
     
$depth $maxdepth)
    return;

  
$dir opendir($path);
  if(!
$dir)
    return;  
// PHP Generates a warning if opendir fails, no need to print more

  
while (false !== ($file readdir($dir))) 
  {
    if(
$file !== "." && $file !== ".."
    {
      
$fullpath $path DIRECTORY_SEPARATOR $file;
      if(
is_dir($fullpath)) 
      {
    if(
$dirhook)
      if(!
call_user_func($dirhook$path$file))
        break;
    
process_dir($fullpath$filehook$dirhook$maxdepth$depth+1);
      }
      else {
    if(
$filehook)
      if(!
call_user_func($filehook$path$file))
        break;
      }
    }
  }
  
closedir($dir);
}

?>




Immediate update: For better performance under some circumstances, change

if(is_dir($fullpath)) {
to:
if($maxdepth > $depth+1 && is_dir($fullpath)) {

This avoids unnecessary stat() calls when you're not interested in the subdirectories.

No comments:

Post a Comment