2008년 02월 16일
HTTP Protocol
Hello class, today we are going to be discussing proxies, HTTP specifically. The goal of this tutorial is to teach you
how the HTTP protocol works, and what proxies are. Lets begin :)
Whenever you load a page in your browser (I hope not Internet Explorer..(Opera FTW!!)), a few things happens behind the
scenes in order for the requested page to be shown. Lets say you type ?http://google.com? into the address bar on your
browser, what will happen is this.
Your browser must start the connection with http://google.com. http://google.com is the DNS (domain name server),
basically, Google has an IP, they set up http://google.com to redirect to that IP. So your browser will first look up
the DNS and get the IP it redirects to, it will then start the connection with the IP. The IP for http://google.com is
64.233.167.99, so your browser connects to that, and then sends the requests. They will look something like this:
GET / HTTP/1.0\r\n
HOST: google.com\r\n
Connection: close\r\n
Depending on the browser, different requests may be sent, those are the only mandatory ones. If everything goes okay,
Google will send the okay, along with the page, however, since we resolved the IP for google.com, the wrong connection
was made, what was needed was www.google.com. So Google will send something like this back to the browser:
HTTP/1.0 301 Moved Permanently\r\n
Location: http://www.google.com/\r\n
Content-Type: text/html\r\n
Server: GWS/2.1\r\n
Content-Length: 219\r\n
Date: Fri, 31 Aug 2007 20:28:32 GMT\r\n
Connection: Keep-Alive\r\n
There?s quite a few codes for HTTP, 300?s are redirect codes, 301 is a permanent redirect, so what happens is the
browser will see the redirect code, and the Location: and will resolve the IP for the DNS of the url specified by the
location, which is http://www.google.com. So, the resolved IP of www.google.com is 74.125.19.99. The browser now starts
the connection with that IP, and sends the same requests, and what Google will send back is something like this:
HTTP/1.0 200 OK\r\n
Cache-Control: private\r\n
Content-Type: text/html; charset=UTF-8\r\n
Server: GWS/2.1\r\n
Date: Fri, 31 Aug 2007 20:28:32 GMT\r\n
Connection: Close\r\n
\r\n\r\n
[WEB PAGE CONTENTS GO HERE]
The browser sees the 200 code which means everything is okay, it sees content-type: text/html along with the charset, so
it knows how to parse the data. \r is a space, \n is a new line, \r\n\r\n means it is the end of the requests. The
requests sent to and from the server are called headers. Once the headers are over, any content will be sent. With
Google, the content sent contains an image (their logo), so the browser must send another request, but using the url of
the image.It will look something like this:
http://google.com/intl/en_ALL/images/logo.gif
The browser will get the IP, and have to be redirected to the www.goog?. and then the request will end up looking like?
GET /intl/en_ALL/images/logo.gif HTTP/1.0\r\n
HOST www.google.com\r\n
Connection: close\r\n
And Google will send:
HTTP/1.0 200 OK\r\n
Cache-Control: private\r\n
Content-Type: image/gif\r\n
Server: GWS/2.1\r\n
Date: Fri, 31 Aug 2007 20:28:32 GMT\r\n
Connection: Close\r\n
\r\n\r\n
[WEB PAGE CONTENTS GO HERE]
Notice the ?Content-Type: image/gif?, that tells the browser to display is as an image, rather as html. That is how a
basic GET request would look. You might be wondering how to submit the form data on Google to search for something. It?s
real simple, if you look at Google?s source, you will see:
<form action="/search" name=f><table cellpadding=0 cellspacing=0><tr valign=top><td
width=25%> </td><td align=center nowrap><input name=hl type=hidden value=en><input
maxlength=2048 name=q size=55 title="Google Search" value=""><br><input name=btnG
type=submit value="Google Search"><input name=btnI type=submit value="I'm Feeling
Lucky"></td><td nowrap width=25%><font size=-2> <a href=/advanced_search?hl=en>Advanced
Search</a><br> <a href=/preferences?hl=en>Preferences</a><br> <a
href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form>
Lets clean that up a bit?
<form action="/search" name=f>
<input name=hl type=hidden value=en>
<input maxlength=2048 name=q size=55 title="Google Search" value="">
<input name=btnG type=submit value="Google Search">
<input name=btnI type=submit value="I'm Feeling Lucky">
</form>
The default for submitting a form is GET, since method=?POST? wasn?t put,it defaults to GET. The <input names get
parsed as GET with the values, so if we search for ?test? and hit submit, it will look like this:
http://www.google.com/search?hl=en&q=test&btnG=Google+Search&btni=I%27m+Feeling+Lucky
and so the request will look like:
GET /search?hl=en&q=test&btnG=Google+Search&btni=I%27m+Feeling+Lucky HTTP/1.0\r\n
Host: www.google.com\r\n
Connection: close\r\n
Google should then send back the results for our search. Post works sorta the same way. When you hit submit, the browser
will get the <input names and construc them in sorta the same way. The only difference, is the url isn?t part of it,
so it would look like this:
hl=en&q=test&btnG=Google+Search&btni=I%27m+Feeling+Lucky
Then what the browser would do is, instead of GET /search, POST /search, so the request would look like this:
POST /search HTTP/1.0\r\n
Host: www.google.com\r\n
Content-Length: 56\r\n
Content-Type: application/x-www-form-urlencoded\r\n\r\n
hl=en&q=test&btnG=Google+Search&btni=I%27m+Feeling+Lucky\r\n\r\n
The content-length tells the server how big the data you will be sending it is, content-type: tells it that it will be
urlencoded from a page <form>.
The only main thing left to cover is cookies. If lets say you went and logged into a page using a POST form, it would
show you the page you are meant to see, but if you click on the link to check your mail, it will not resend the post
data, and so the mail page will have no way of knowing you are logged in, or who you are. The way to solve this, when
you submitted the POST data to the server and logged in, the server also saved a cookie on your computer, the contents
of the cookie in this example is: id=48&loggedin=1 . And then when you click on the mail link, you will send the
cookie back to the server, and the server will know that your id is 48 and that you are logged in. Here is the header
the server will send you:
Set-Cookie: id=48&loggedin=1; expires Tue, 28-Nov-2010 20:28:32 GMT; path=/;domain=.site.com\r\n
The first part just contains the variables, the second part tells us when the cookie will expire and should be deleted,
the third part says the path, in this case / and so it could be used for any path in the site, and the fourth part tells
us the domain. See the . in front of site.com, that tells us that login.site.com will also except the cookie.
Now, when the browser loads the mail page, it will send this header:
Cookie: id=48&loggedin=1\r\n
Real simple huh?
So that is the basics of how the HTTP protocol works. Of course there is a lot more, but that is all that is needed to
complete most page transactions. For more about it, you can goto <a
href=?http://www.w3.org/Protocols/rfc2616/rfc2616.html?
mce_href=?http://www.w3.org/Protocols/rfc2616/rfc2616.html?>RFC 2616</a>.
Now onto the proxy part. I assume everyone has at least heard of a proxy, I?m sure most of you at least sorta know what
it is. Let me just explain what it is to make sure everyone is on the same page.
There are different types of proxies, all do the same thing, but for different protocols. We will be talking about a
HTTP proxy, using the web.
Understanding what a proxy does is very simple, understanding how it does it is a bit harder. With our earlier look at
the HTTP protocol, we talked about how the browser sends the initial request to the server, and the server sends the
response back, well what a proxy does, is instead of you sending the request to the server, you send it to the proxy,
the proxy will then send your request to the server, the server has no idea you exist, only the proxy, so the server
sends the response back to the proxy, and the proxy sends that response to you. Very simple, but in case some of you are
still a little confused, here?s a diagram.
Normal Transaction:
YOU ?> SERVER | you send the request to the server
YOU <? SERVER | the server sends the response to you
Proxy Transaction:
YOU ?> PROXY ?> SERVER |you send the request to the proxy, the proxy forwards it to the server
YOU <? PROXY <? SERVER |the server sends the response to the proxy, the proxy forwards it to you
It?s really simple, I hope everyone understands what it does. To make a proxy server, you have to know how to code in
some kind of capable langage. PHP for example.
What we will need to do is this: get the request from the user, send it to the server, get the response from the server,
parse it, send it to the user. The reason we need to parse it is because the server assumes the proxy will be the one
viewing it, but what happens, is the proxy gets the page source from the server, and displays it on its server, making
it look like the page on the proxy server is it?s own, when in fact it is not. So if any relative paths are used, EX.
<img src=?/images/logo.gif? mce_src=?/images/logo.gif?>
Rather than
<img src=?http://www.site.com/images/logo.gif? mce_src=?http://www.site.com/images/logo.gif?>
For the first example, the browser would add the url of the server onto it and load the image, however, if it tries it
on the proxy server, it wont find the image (the page isn?t the proxies, so the image isn?t stored on that server, but
the browser does not know this), so our proxy must parse the url?s and make the browser load them from the real server.
However, if we do that directly, only the main page will be requested from the proxy, others like images, will be loaded
directly by the browser, defeating the purpose. So what we do, is make the browser load the proxy, and tell the proxy
what page to GET.
I?ve done my best to explain this, but it can be a little complicated at first, so if there is anything you are unsure
of, feel free to let me know, and I will do the best that I can to help out.
how the HTTP protocol works, and what proxies are. Lets begin :)
Whenever you load a page in your browser (I hope not Internet Explorer..(Opera FTW!!)), a few things happens behind the
scenes in order for the requested page to be shown. Lets say you type ?http://google.com? into the address bar on your
browser, what will happen is this.
Your browser must start the connection with http://google.com. http://google.com is the DNS (domain name server),
basically, Google has an IP, they set up http://google.com to redirect to that IP. So your browser will first look up
the DNS and get the IP it redirects to, it will then start the connection with the IP. The IP for http://google.com is
64.233.167.99, so your browser connects to that, and then sends the requests. They will look something like this:
GET / HTTP/1.0\r\n
HOST: google.com\r\n
Connection: close\r\n
Depending on the browser, different requests may be sent, those are the only mandatory ones. If everything goes okay,
Google will send the okay, along with the page, however, since we resolved the IP for google.com, the wrong connection
was made, what was needed was www.google.com. So Google will send something like this back to the browser:
HTTP/1.0 301 Moved Permanently\r\n
Location: http://www.google.com/\r\n
Content-Type: text/html\r\n
Server: GWS/2.1\r\n
Content-Length: 219\r\n
Date: Fri, 31 Aug 2007 20:28:32 GMT\r\n
Connection: Keep-Alive\r\n
There?s quite a few codes for HTTP, 300?s are redirect codes, 301 is a permanent redirect, so what happens is the
browser will see the redirect code, and the Location: and will resolve the IP for the DNS of the url specified by the
location, which is http://www.google.com. So, the resolved IP of www.google.com is 74.125.19.99. The browser now starts
the connection with that IP, and sends the same requests, and what Google will send back is something like this:
HTTP/1.0 200 OK\r\n
Cache-Control: private\r\n
Content-Type: text/html; charset=UTF-8\r\n
Server: GWS/2.1\r\n
Date: Fri, 31 Aug 2007 20:28:32 GMT\r\n
Connection: Close\r\n
\r\n\r\n
[WEB PAGE CONTENTS GO HERE]
The browser sees the 200 code which means everything is okay, it sees content-type: text/html along with the charset, so
it knows how to parse the data. \r is a space, \n is a new line, \r\n\r\n means it is the end of the requests. The
requests sent to and from the server are called headers. Once the headers are over, any content will be sent. With
Google, the content sent contains an image (their logo), so the browser must send another request, but using the url of
the image.It will look something like this:
http://google.com/intl/en_ALL/images/logo.gif
The browser will get the IP, and have to be redirected to the www.goog?. and then the request will end up looking like?
GET /intl/en_ALL/images/logo.gif HTTP/1.0\r\n
HOST www.google.com\r\n
Connection: close\r\n
And Google will send:
HTTP/1.0 200 OK\r\n
Cache-Control: private\r\n
Content-Type: image/gif\r\n
Server: GWS/2.1\r\n
Date: Fri, 31 Aug 2007 20:28:32 GMT\r\n
Connection: Close\r\n
\r\n\r\n
[WEB PAGE CONTENTS GO HERE]
Notice the ?Content-Type: image/gif?, that tells the browser to display is as an image, rather as html. That is how a
basic GET request would look. You might be wondering how to submit the form data on Google to search for something. It?s
real simple, if you look at Google?s source, you will see:
<form action="/search" name=f><table cellpadding=0 cellspacing=0><tr valign=top><td
width=25%> </td><td align=center nowrap><input name=hl type=hidden value=en><input
maxlength=2048 name=q size=55 title="Google Search" value=""><br><input name=btnG
type=submit value="Google Search"><input name=btnI type=submit value="I'm Feeling
Lucky"></td><td nowrap width=25%><font size=-2> <a href=/advanced_search?hl=en>Advanced
Search</a><br> <a href=/preferences?hl=en>Preferences</a><br> <a
href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form>
Lets clean that up a bit?
<form action="/search" name=f>
<input name=hl type=hidden value=en>
<input maxlength=2048 name=q size=55 title="Google Search" value="">
<input name=btnG type=submit value="Google Search">
<input name=btnI type=submit value="I'm Feeling Lucky">
</form>
The default for submitting a form is GET, since method=?POST? wasn?t put,it defaults to GET. The <input names get
parsed as GET with the values, so if we search for ?test? and hit submit, it will look like this:
http://www.google.com/search?hl=en&q=test&btnG=Google+Search&btni=I%27m+Feeling+Lucky
and so the request will look like:
GET /search?hl=en&q=test&btnG=Google+Search&btni=I%27m+Feeling+Lucky HTTP/1.0\r\n
Host: www.google.com\r\n
Connection: close\r\n
Google should then send back the results for our search. Post works sorta the same way. When you hit submit, the browser
will get the <input names and construc them in sorta the same way. The only difference, is the url isn?t part of it,
so it would look like this:
hl=en&q=test&btnG=Google+Search&btni=I%27m+Feeling+Lucky
Then what the browser would do is, instead of GET /search, POST /search, so the request would look like this:
POST /search HTTP/1.0\r\n
Host: www.google.com\r\n
Content-Length: 56\r\n
Content-Type: application/x-www-form-urlencoded\r\n\r\n
hl=en&q=test&btnG=Google+Search&btni=I%27m+Feeling+Lucky\r\n\r\n
The content-length tells the server how big the data you will be sending it is, content-type: tells it that it will be
urlencoded from a page <form>.
The only main thing left to cover is cookies. If lets say you went and logged into a page using a POST form, it would
show you the page you are meant to see, but if you click on the link to check your mail, it will not resend the post
data, and so the mail page will have no way of knowing you are logged in, or who you are. The way to solve this, when
you submitted the POST data to the server and logged in, the server also saved a cookie on your computer, the contents
of the cookie in this example is: id=48&loggedin=1 . And then when you click on the mail link, you will send the
cookie back to the server, and the server will know that your id is 48 and that you are logged in. Here is the header
the server will send you:
Set-Cookie: id=48&loggedin=1; expires Tue, 28-Nov-2010 20:28:32 GMT; path=/;domain=.site.com\r\n
The first part just contains the variables, the second part tells us when the cookie will expire and should be deleted,
the third part says the path, in this case / and so it could be used for any path in the site, and the fourth part tells
us the domain. See the . in front of site.com, that tells us that login.site.com will also except the cookie.
Now, when the browser loads the mail page, it will send this header:
Cookie: id=48&loggedin=1\r\n
Real simple huh?
So that is the basics of how the HTTP protocol works. Of course there is a lot more, but that is all that is needed to
complete most page transactions. For more about it, you can goto <a
href=?http://www.w3.org/Protocols/rfc2616/rfc2616.html?
mce_href=?http://www.w3.org/Protocols/rfc2616/rfc2616.html?>RFC 2616</a>.
Now onto the proxy part. I assume everyone has at least heard of a proxy, I?m sure most of you at least sorta know what
it is. Let me just explain what it is to make sure everyone is on the same page.
There are different types of proxies, all do the same thing, but for different protocols. We will be talking about a
HTTP proxy, using the web.
Understanding what a proxy does is very simple, understanding how it does it is a bit harder. With our earlier look at
the HTTP protocol, we talked about how the browser sends the initial request to the server, and the server sends the
response back, well what a proxy does, is instead of you sending the request to the server, you send it to the proxy,
the proxy will then send your request to the server, the server has no idea you exist, only the proxy, so the server
sends the response back to the proxy, and the proxy sends that response to you. Very simple, but in case some of you are
still a little confused, here?s a diagram.
Normal Transaction:
YOU ?> SERVER | you send the request to the server
YOU <? SERVER | the server sends the response to you
Proxy Transaction:
YOU ?> PROXY ?> SERVER |you send the request to the proxy, the proxy forwards it to the server
YOU <? PROXY <? SERVER |the server sends the response to the proxy, the proxy forwards it to you
It?s really simple, I hope everyone understands what it does. To make a proxy server, you have to know how to code in
some kind of capable langage. PHP for example.
What we will need to do is this: get the request from the user, send it to the server, get the response from the server,
parse it, send it to the user. The reason we need to parse it is because the server assumes the proxy will be the one
viewing it, but what happens, is the proxy gets the page source from the server, and displays it on its server, making
it look like the page on the proxy server is it?s own, when in fact it is not. So if any relative paths are used, EX.
<img src=?/images/logo.gif? mce_src=?/images/logo.gif?>
Rather than
<img src=?http://www.site.com/images/logo.gif? mce_src=?http://www.site.com/images/logo.gif?>
For the first example, the browser would add the url of the server onto it and load the image, however, if it tries it
on the proxy server, it wont find the image (the page isn?t the proxies, so the image isn?t stored on that server, but
the browser does not know this), so our proxy must parse the url?s and make the browser load them from the real server.
However, if we do that directly, only the main page will be requested from the proxy, others like images, will be loaded
directly by the browser, defeating the purpose. So what we do, is make the browser load the proxy, and tell the proxy
what page to GET.
I?ve done my best to explain this, but it can be a little complicated at first, so if there is anything you are unsure
of, feel free to let me know, and I will do the best that I can to help out.
# by | 2008/02/16 10:11 | →Basic Articles | 트랙백(1)





☞ 내 이글루에 이 글과 관련된 글 쓰기 (트랙백 보내기) [도움말]
제목 : HTTP/1.1 상태 코드 정의
HTTP/1.1 상태 코드 정의는 RFC2616에 규정되어 있는데 간단히 정리하면 다음과 같다. http://www.w3.org/Protocols/rfc2616/rfc2616.html100은 Informational(정보제공), 200은 Successful(성공), 300은 Rediection(추가작업), 400은 Client Error(클라이언트측 오류), 500은 ServerError(서버측 오류)를 의미한다. 100, 200, 300 코드......more