Home > Uncategorized > Screen scraping windows applications

Screen scraping windows applications

I’ve often found the need for extracting text from other windows applications, for example to extract text from MSN messenger, or the Internet explorer address bar or whatever. Instead of opening up Spy++ to get the Hwnd, and working with scraps of Win32 API calls I decided to write a generic class which recursively extracts text from every window on the screen, and stores it in an XmlDocument.

I used a VB6 version of this class before, so I decided to port it to .NET to modernize the code a bit, hense it’s written in VB.NET.

First I declared a few API functions thus:

Private Declare Function GetWindow Lib "user32.dll" (ByVal hWnd As Integer, ByVal wCmd As Integer) As Integer
Private Declare Function GetDesktopWindow Lib "user32.dll" () As Integer
Private Declare Function SendMessage Lib "user32.dll" Alias "SendMessageA" (ByVal hWnd As Integer, ByVal Msg As Integer, ByVal wParam As Int32, ByVal lParam As Int32) As Integer
Private Declare Function SendMessage Lib "user32.dll" Alias "SendMessageA" (ByVal hWnd As Integer, ByVal Msg As Integer, ByVal wParam As Int32, ByVal lParam As String) As Integer
Private Const GW_HWNDNEXT As Short = 2 ‘The window below the given window in the Z-order.
Private Const GW_CHILD As Short = 5 ‘The topmost of the given window’s child windows. This has the same effect as using the GetTopWindow function.
Private Const WM_GETTEXT As Short = &HDS ‘ Get Window text
Private Const WM_GETTEXTLENGTH As Short = &HES ‘ Get the length of window text

Notice that SendMessage is declared twice, with an overloaded parameter lParam. I found this to be a better approach to using "Object" as the type of the parameter, as it kept causing Ole errors.

The enty function was simply

Public Function Analyze() As System.Xml.XmlDocument
Return Analyze(GetDesktopWindow())
End Function

This then calls the Analyze(hWnd) function thus:


Function Analyze(ByVal hWnd As Integer) As System.Xml.XmlDocument
Dim childHwnd As Integer
Dim XMLRootElement As Xml.XmlElement
Dim XMLRoot As Xml.XmlElement
          xmlDoc =
New XmlDocument
          childHwnd = GetWindow(hWnd, GW_CHILD)
          XMLRootElement = xmlDoc.CreateElement("window")
          XMLRoot = xmlDoc.AppendChild(XMLRootElement)
          AppendAnalysis(XMLRoot, hWnd,
          recurseChildren(XMLRoot, childHwnd, hWnd)
Return xmlDoc
End Function

Which then calls the recursive function recurseChildren

Private Sub recurseChildren(ByRef XMLParent As System.Xml.XmlElement, ByRef childHwnd As Integer, ByRef parentHwnd As Integer)
Dim grandChildHwnd As Integer
Dim ZCount As Integer = 1
Dim XMLChildElement As XmlElement
Dim XMLChild As XmlElement
          Do Until childHwnd = 0
                     XMLChildElement = xmlDoc.CreateElement("window")
XMLChild = XMLParent.AppendChild(XMLChildElement)
AppendAnalysis(XMLChild, childHwnd, parentHwnd, ZCount)
ZCount = ZCount + 1
grandChildHwnd = GetWindow(childHwnd, GW_CHILD)
recurseChildren(XMLChild, grandChildHwnd, childHwnd)
childHwnd = GetWindow(childHwnd, GW_HWNDNEXT)
End Sub

This calls AppendAnalysis to build up the XmlDocument

Private Sub AppendAnalysis(ByRef Parent As System.Xml.XmlElement, ByRef childHwnd As Integer, ByRef parentHwnd As Integer, ByRef ZCount As Integer)
Dim Description As String
Dim XMLNodeWindowText As XmlElement
         XMLNodeWindowText = xmlDoc.CreateElement("Text")
         XMLNodeWindowText.InnerText = getTextFromHwnd(childHwnd)
If XMLNodeWindowText.InnerText = "" Then
Description = "#" & ZCount
Description = XMLNodeWindowText.InnerText
End If
End Sub

And finally, we get down and dirty with the Win32 API

Private Function getTextFromHwnd(ByVal hwnd As Integer) As String
‘ use <?xml version="1.0" encoding="UTF-8"?> as a header.
Dim wintext As String ‘ receives the copied text from the target window
Dim slength As Integer ‘ length of the window text
Dim retval As Integer ‘ return value of message
slength = SendMessage(hwnd, WM_GETTEXTLENGTH, 0, 0) + 1
    wintext =
New String(ChrW(0), slength)
    retval = SendMessage(hwnd, WM_GETTEXT, slength, wintext)
    wintext = wintext.Substring(0, retval)
Return wintext
End Function

Here, I used SendMessage WM_GETTEXT rather than getWindowText, since I found that the SendMessage technique works better accross process boundaries.

The resultant XML looks like this

<window><Text></Text><window><Text></Text></window><window><Text>AppAnalyze.NET – Microsoft Visual Basic .NET [run] – Appanalyze.vb [Read Only]</Text>…

Categories: Uncategorized
  1. Unknown
    April 29, 2005 at 8:17 pm

    This code work great. But is there any way i can extract the child windows names along with the text? I would like to be able to read the field names inside of the form.Thansk,Rovi

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: