In rule-based web scraping, the slightest change in website layout breaks the process, prompting the script overhaul to adapt to a new layout. With machine learning (ML), you don’t have to set up or readjust a dedicated parser for an individual web page. The trained model recognizes prices, descriptions, or anything it was trained to do, even after layout changes.
Tune in to the Oxylabs webinar to grasp the ins and outs of ML-based parsing. Tadas Gedgaudas, a developer at Oxylabs, shared his knowledge of large language models – ChatGPT in this case – and their integration into the web scraping process.
Tadas has covered the following:
➡️ Nuances of data structurization with and without ML.
➡️ A walkthrough of getting, preparing, and submitting data to ChatGPT.
➡️ A detailed demo of combining ChatGPT with Oxylabs Web Scraper API to scrape and parse web pages without building your own tools.
The webinar is an essential stepping stone for developers and decision-makers in understanding how ML-enabled parsing saves time, drastically reduces maintenance, and turns any website into structured data.
For your convenience, Tadas has provided code samples of his presentation. You can access an open-source Oxy® Parser library here:
https://github.com/oxylabs/OxyParser