Extract HTML code from URL in Python and C#


In this post, I am going to code a small Python exercise of extracting the HTML code from a URL. You can find the Python on GitHub.

Python code

We need the packages urllib2:

import time, urllib2

def gethtml(url):
req = urllib2.Request(url)
return urllib2.urlopen(req).read()
except Exception, e:
return ''
url = 'https://www.manejandodatos.es'
print gethtml(url)

The result from this code is the HTML code from the URL:



With this easy and short code I end up my objective: get the HTML code of a website. If you want to extract information, you will need new packages, but I explain it in a future post.

Now, coding on C#

I am going to code exactle the same as before, but in C. I need 3 libraries and about 25-30 lines of code in total:

<code>using System;
 using System.Net;
 using System.IO;</code>
 namespace ConsoleApplication1
 class Program
 static void Main(string[] args)
 Console.WriteLine( getSourceCode("http://www.google.es"));
 private static string getSourceCode(string uri)
 string sourceCode = "";
 { HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
 HttpWebResponse response = (HttpWebResponse)request.GetResponse();
 StreamReader sr = new StreamReader(response.GetResponseStream());
 sourceCode = sr.ReadToEnd();
 return sourceCode; }
 { sourceCode = "ERROR"; }
 return sourceCode;
 } } }</code>

The code on IDE:

Código fuente en C#
Source code on C#

And the execution on C#:

Codigo fuente de una Web
Source code on C#

Differences between C# and Python

There are no differences related to the objective to achieve, because you can do it in both languages with no problem, and the time of execution is similar in both. You can notice that Python’s code is shorter.

In my opinion, the biggest difference between these two languages is that .NET requires more that 1 Gb to be installed, while Python only needs 100 Mb. Even Python can be portable!!

Have a nice day and I hope you like it!