Python and BeautifulSoup: extracting prizes from Quiniela

Today’s entrance is the solution I wrote in order to accelerate (and reduce errors) some parts of my software about Quinielas (Quiniela is a spanish betting game, where you bet 15 matches). I have used Python with  BeautifulSoap. The goal is to scrap the prizes for the different categories, and the volumen fo money for the quiniela of the week. On previous entrances I showed you how to install it, but now, it’s time to experiement with a real problem I had.

The objective

The content to scrap is the red square. You han see the information of prizes and volumes:

Zona de interés: Escrutinio

The final goal is to save these 8 data on a text file.

On a previous entrance, I wrote about how to retrieve HTML code from a website (also in C#, but I am using the Python way). It is neccesary to study a bit the HTML code to see what’s the name of the DIV I am interested in. To sole this, I use the developer tools included on Firefox or in Chrome).

Inspección Zona de Interés

Inspecting the Objective DIV

The objective DIV contains a table with a class “fill-table m1015“. Let’s create a BeautifulSoap object to get it:

soup = bs(data)
tEscritunio = soup.findAll("table", { "class" : "fill-table m1015" }) 

We have to prepare out rutine to be aware when the data is NOT available yet:

Recaudacion que NO interesa

Objective zone with NO data

I need to create a new functions to check:

  • The row is valid,
  • Another function to save the data into a text file.

For the first task, I need to validate the number of rows that have valid strings (the function EsLineaValida will checck about the lenght and avoid some strings I’m not interested in, like * neither “Euros“. A have build this functions because maibe I would reuse it.

def EsLineaValida(linea):
 if len(linea) == 0: return False
 if "*" in linea: return False
 if "Euros" in linea:
  return False
 return True

I am only interested in 8 rows that are: the total of money bet, and the prizes for 15, 14, 13, 12, 11 and 10 matches hit. Only when this condition is met, I keep on extracting the data and, by now, I will save it on a text file. If the condition is not met, I will inform that I couldn’t get the data.

I have defined a function in ordeer to save a list to a text file:

def saveList2File(file, lista):
 sfile = open(file,"w")
 for linea in lista:
  sfile.writelines(str(linea) + nl)

The main function is:

def Escrutinio(dataHTML, ficheroSalida, jorn):
    soup = bs(dataHTML) # BeautifulSoap para el código HTML
    tEscritunio = soup.findAll("table", { "class" : "fill-table m1015" }) # TABLA
    tr = []
   for mydiv in tEscritunio:
         rows = mydiv.findAll('tr')
         for row in rows:
            if EsLineaValida(row.text):
                tmp = row.text.strip().split(nl)[-1] # ULTIMA COLUMNA
                tmp = tmp.replace('.','') # Ajuste
                tr.append( tmp)

    if len(tr) == 8:
        saveList2File(ficheroSalida, tr)
        print "Escrutinio recuperada!"
        print "DATOS de Escrutinio NO son los esperados!" + nl, tr

Here you can see the code running:



Al Pythons’s code is available on GitHub. You can use it.

As you can see, I am very happy with Python because doing web scrapping is very easy with Python and BeautifilSoup. On future post, I will put more examples!

Have a nice day!

