Today’s entrance is the solution I wrote in order to accelerate (and reduce errors) some parts of my software about Quinielas (Quiniela is a spanish betting game, where you bet 15 matches). I have used Python with BeautifulSoap. The goal is to scrap the prizes for the different categories, and the volumen fo money for the quiniela of the week. On previous entrances I showed you how to install it, but now, it’s time to experiement with a real problem I had.
The objective
The content to scrap is the red square. You han see the information of prizes and volumes:
The final goal is to save these 8 data on a text file.
On a previous entrance, I wrote about how to retrieve HTML code from a website (also in C#, but I am using the Python way). It is neccesary to study a bit the HTML code to see what’s the name of the DIV I am interested in. To sole this, I use the developer tools included on Firefox or in Chrome).
The objective DIV contains a table with a class “fill-table m1015“. Let’s create a BeautifulSoap object to get it:
soup = bs(data) tEscritunio = soup.findAll("table", { "class" : "fill-table m1015" })
We have to prepare out rutine to be aware when the data is NOT available yet:
I need to create a new functions to check:
- The row is valid,
- Another function to save the data into a text file.
For the first task, I need to validate the number of rows that have valid strings (the function EsLineaValida will checck about the lenght and avoid some strings I’m not interested in, like * neither “Euros“. A have build this functions because maibe I would reuse it.
def EsLineaValida(linea): if len(linea) == 0: return False if "*" in linea: return False if "Euros" in linea: return False return True
I am only interested in 8 rows that are: the total of money bet, and the prizes for 15, 14, 13, 12, 11 and 10 matches hit. Only when this condition is met, I keep on extracting the data and, by now, I will save it on a text file. If the condition is not met, I will inform that I couldn’t get the data.
I have defined a function in ordeer to save a list to a text file:
def saveList2File(file, lista): sfile = open(file,"w") for linea in lista: sfile.writelines(str(linea) + nl) sfile.close()
The main function is:
def Escrutinio(dataHTML, ficheroSalida, jorn): soup = bs(dataHTML) # BeautifulSoap para el código HTML tEscritunio = soup.findAll("table", { "class" : "fill-table m1015" }) # TABLA tr = [] for mydiv in tEscritunio: rows = mydiv.findAll('tr') for row in rows: if EsLineaValida(row.text): tmp = row.text.strip().split(nl)[-1] # ULTIMA COLUMNA tmp = tmp.replace('.','') # Ajuste tr.append( tmp) if len(tr) == 8: saveList2File(ficheroSalida, tr) print "Escrutinio recuperada!" else: print "DATOS de Escrutinio NO son los esperados!" + nl, tr time.sleep(2)
Here you can see the code running:
Al Pythons’s code is available on GitHub. You can use it.
As you can see, I am very happy with Python because doing web scrapping is very easy with Python and BeautifilSoup. On future post, I will put more examples!
Have a nice day!