Monday, January 9, 2012

The disappointing truth of preg_match on windows !!



lately , I have been working on extracting some information from html documents, when you do that all what you will need is a parser and sometimes the need to use regular expressions. it is supposed to be easy job , isn't it ? .
here is the problem , writing a regular expression like that will crash your apache on windows.
 $matches= array(); 
preg_match(“/<p>(<?value>(.)+)<.p>/i”,$htmlString,$matches); 
var_dump($matches); 
after a lot of searching why this is crashing the server , the problem was on the stack size given for the apache on windows , so how can you fix that ?!!!

First solution :
download this tool http://www.ntcore.com/exsuite.php , it will give you the ability to change the stack size of the httpd.exe , if you put extra “FF” on the left of the value , this will give you a lot of space.

Second solution:
why the hell is the stack is running out of space any way ?!! , the number of characters in the document was 50,000 characters so what is the big deal ?! , the problem was in (.)+ part , because actually to match this expression the built in library had to generate lots of trees and recursive calls. if you replaced this part by writing the possible characters that could appear. it will generate less trees and less recursive calls without the need to change anything.

Third solution:
if you aren’t ok with the above solutions and you don’t want to change any thing , you can run the script from the command line interface.

I hope this article will save time for you.