Screen scraping windows applications
I’ve often found the need for extracting text from other windows applications, for example to extract text from MSN messenger, or the Internet explorer address bar or whatever. Instead of opening up Spy++ to get the Hwnd, and working with scraps of Win32 API calls I decided to write a generic class which recursively extracts text from every window on the screen, and stores it in an XmlDocument.
I used a VB6 version of this class before, so I decided to port it to .NET to modernize the code a bit, hense it’s written in VB.NET.
First I declared a few API functions thus:
Private Declare Function GetWindow Lib "user32.dll" (ByVal hWnd As Integer, ByVal wCmd As Integer) As Integer
Private Declare Function GetDesktopWindow Lib "user32.dll" () As Integer
Private Declare Function SendMessage Lib "user32.dll" Alias "SendMessageA" (ByVal hWnd As Integer, ByVal Msg As Integer, ByVal wParam As Int32, ByVal lParam As Int32) As Integer
Private Declare Function SendMessage Lib "user32.dll" Alias "SendMessageA" (ByVal hWnd As Integer, ByVal Msg As Integer, ByVal wParam As Int32, ByVal lParam As String) As Integer
Private Const GW_HWNDNEXT As Short = 2 ‘The window below the given window in the Z-order.
Private Const GW_CHILD As Short = 5 ‘The topmost of the given window’s child windows. This has the same effect as using the GetTopWindow function.
Private Const WM_GETTEXT As Short = &HDS ‘ Get Window text
Private Const WM_GETTEXTLENGTH As Short = &HES ‘ Get the length of window text
Notice that SendMessage is declared twice, with an overloaded parameter lParam. I found this to be a better approach to using "Object" as the type of the parameter, as it kept causing Ole errors.
The enty function was simply
Public Function Analyze() As System.Xml.XmlDocument
Return Analyze(GetDesktopWindow())
End Function
This then calls the Analyze(hWnd) function thus:
Private
Function Analyze(ByVal hWnd As Integer) As System.Xml.XmlDocument
Dim childHwnd As Integer
Dim XMLRootElement As Xml.XmlElement
Dim XMLRoot As Xml.XmlElement
xmlDoc = New XmlDocument
childHwnd = GetWindow(hWnd, GW_CHILD)
XMLRootElement = xmlDoc.CreateElement("window")
XMLRoot = xmlDoc.AppendChild(XMLRootElement)
AppendAnalysis(XMLRoot, hWnd, Nothing,0)
recurseChildren(XMLRoot, childHwnd, hWnd)
Return xmlDoc
End Function
Which then calls the recursive function recurseChildren
Private Sub recurseChildren(ByRef XMLParent As System.Xml.XmlElement, ByRef childHwnd As Integer, ByRef parentHwnd As Integer)
Dim grandChildHwnd As Integer
Dim ZCount As Integer = 1
Dim XMLChildElement As XmlElement
Dim XMLChild As XmlElement
Do Until childHwnd = 0
XMLChildElement = xmlDoc.CreateElement("window")
XMLChild = XMLParent.AppendChild(XMLChildElement)
AppendAnalysis(XMLChild, childHwnd, parentHwnd, ZCount)
ZCount = ZCount + 1
grandChildHwnd = GetWindow(childHwnd, GW_CHILD)
recurseChildren(XMLChild, grandChildHwnd, childHwnd)
childHwnd = GetWindow(childHwnd, GW_HWNDNEXT)
Loop
End Sub
This calls AppendAnalysis to build up the XmlDocument
Private Sub AppendAnalysis(ByRef Parent As System.Xml.XmlElement, ByRef childHwnd As Integer, ByRef parentHwnd As Integer, ByRef ZCount As Integer)
Dim Description As String
Dim XMLNodeWindowText As XmlElement
XMLNodeWindowText = xmlDoc.CreateElement("Text")
XMLNodeWindowText.InnerText = getTextFromHwnd(childHwnd)
Parent.AppendChild(XMLNodeWindowText)
If XMLNodeWindowText.InnerText = "" Then
Description = "#" & ZCount
Else
Description = XMLNodeWindowText.InnerText
End If
End Sub
And finally, we get down and dirty with the Win32 API
Private Function getTextFromHwnd(ByVal hwnd As Integer) As String
‘ use <?xml version="1.0" encoding="UTF-8"?> as a header.
Dim wintext As String ‘ receives the copied text from the target window
Dim slength As Integer ‘ length of the window text
Dim retval As Integer ‘ return value of message
slength = SendMessage(hwnd, WM_GETTEXTLENGTH, 0, 0) + 1
wintext = New String(ChrW(0), slength)
retval = SendMessage(hwnd, WM_GETTEXT, slength, wintext)
wintext = wintext.Substring(0, retval)
Return wintext
End Function
Here, I used SendMessage WM_GETTEXT rather than getWindowText, since I found that the SendMessage technique works better accross process boundaries.
The resultant XML looks like this
<window><Text></Text><window><Text></Text></window><window><Text>AppAnalyze.NET – Microsoft Visual Basic .NET [run] – Appanalyze.vb [Read Only]</Text>…
This code work great. But is there any way i can extract the child windows names along with the text? I would like to be able to read the field names inside of the form.Thansk,Rovi
LikeLike