Extracting Data from an AJAX-enabled Web Site

Computers & Technology

  • Author Tracy Morgan
  • Published January 20, 2012
  • Word count 470

AJAX is the what makes web sites interactive. It stands for Asynchronous JavaScript and XML. Based on the xmlHttpRequest function, it allows the website to contact the server and retrieve data (referred to as an "AJAX Callback") without reloading the web page. In one simplified example, it stops that flicker when you hit the "Submit" button, giving the website a more streamlined feel. Because of this, web sites have gotten complicated and web mining has as well. Visual Web Ripper is still able to work through this process, although code navigation is necessary.

How To

Before starting, you must tell Visual Web Ripper the changes that will occur on the website. To keep data from being extracted prematurely, VWR will need to wait for the callback to finish.

To have a successful template when working with AJAX, go to Options and click the AJAX or JavaScript radio button. If necessary, use the Wait for element drop down box to select the proper element, otherwise the default Wait Element will the first content element in the template.

VWR will render AJAX in three steps, unless told otherwise

1.Clicks on the selected link or web form button

2.Waits for the Wait Element and/or its container to change. This step will be completed automatically if the content doesn't exist.

3.Waits for the Wait Element to appear on the web page.

Troubleshooting:

A lot of web sites use transition words to tell users that content is being loaded (e.g. "Loading..."). VWR will assume this is the change and extract that as data. To counter this, utilize a Wait Script.

Sometimes an AJAX callback will result in no change either to the entire web page or to the content that preloads. If either of these scenarios arises, click the appropriate checkbox(es): Wait is optional and/or Optional wait applies only to the first link in a list.

All AJAX is JavaScript, but not all JavaScript is AJAX. If the code doesn't dynamically change content, it is simple JavaScript and thus the Wait is optional and/or Optional wait applies only to the first link in a list boxes need to be checked.

VWR is all-seeing, including hidden content, so no separate AJAX action template is necessary. Switch to browser mode to find what you need.

AJAX was built for content that appears after a page rendering. In this scenario, go to Options, click Misc, and check Wait for element.

Iframes are tricky, since they look like AJAX callbacks to VWR. Use the Keep loading webpage until manual stop button on the toolbar to work around this issue.

Though AJAX sites can present hurdles when doing web mining, combining a powerful extraction tool such as Visual Web Ripper with the knowledge of how to navigate through it puts the target data within reach.

For more information about Data Scraping Software Please visit www.visualwebripper.com

Article source: https://articlebiz.com
This article has been viewed 1,636 times.

Rate article

Article comments

There are no posted comments.

Related articles