Entries tagged javascript

Contribute to wappalyzer

Posted on 20. Mai 2020 Comments

Wappalyzer is an open source website analyzer written in node.js. It basically just parses a big json file and uses regular expressions to find patterns in websites HTML, CSS, JavaScript and Server Headers.

  1. Fork the repository https://github.com/AliasIO/wappalyzer
  2. Clone the fork to your computer
  3. Install Docker
  4. ./run links
  5. Write Regex
  6. Check with regex101.com or similar tool
  7. Add valid json to apps.json
  8. Add a 32×32, 64×64 PNG or SVG to the icons folder
  9. Commit to another branch on your fork
  10. Push
  11. Create Pull Request, showing that it’s a relevant project: 1k+ stars on GitHub, Pages using it etc

I just created a pull request for Alpine.js, A rugged, minimal framework for composing JavaScript behavior in your markup, that I like to pull in when Vue or React would be overkill.

Crawling with Web Storage in PHP

Posted on 28. September 2019 Comments

Some pages have an extra screen before the real site appears, e.g. for age verification (like porn, for now anyways, or drug sites) or country/currency selection (like travel and big company sites). As far as I researched this, sometimes the selection is stored in a cookie, sometimes in Web Storage.

Crawling with Cookies is no problem with Guzzle.

But for example, at https://cannabis.wiki the age verification is stored in the Local Storage as you can see in the Chrome DevTools:

image

Using spatie/crawler we can inject a Browsershot (puppeteer) instance, that sets the localStorage key ageConfirmed through custom JavaScript:

$js = "localStorage.setItem('ageConfirmed', '1');";
$browsershot->setOption('addScriptTag', json_encode(['content' => $js]));

However, the code is only injected after the side is already loaded. Therefore we have to use JavaScript to reload the page:

$js = "localStorage.setItem('ageConfirmed', '1');location.reload()";
$browsershot->setOption('addScriptTag', json_encode(['content' => $js]));

You can test this on the console e.g. by using ->save($pathToFile); and have a look at the screenshot to see if it worked.