Let’s start the article with a little experiment. Take a look at the following two URLs, and answer, what domain do they point to?
If you’ve thought ‘Easy, bbc.com‘, you’re half right: the first URL leads to that domain… the second? not even remotely. If you’re confused, it’s because, despite the fact that Internet users spend all day clicking on URLs, we actually know little about them.
The trick of the at
The point is that, in reality, everything everything between ‘https://’ and the at symbol is being ignored by the browser. Originally, web domains could have a structure similar to e-mails, indicating a ‘username’ before the ‘@’.
In case of using it, we were telling the website that will log in with that username. In fact, we could use the following structure to include the password in the URL as well:
However, nowadays modern browsers, for security, prevent their users from being able to authenticate accidentally with a single click on websites, so they ignore everything we enter before the at sign and just direct the user to the part of the URL that is to the right of it.
Anatomy of a URL
By now you may have seen a couple of possible failures in the example we put at the beginning of the article. We will address them later, now we are going to review all the parts that a URL can be made up of:
- Protocol: Tells the browser how to connect to the server. Is not the same “http://” than your safe version (“https://“) —although both connect to web pages—, or that “mailto://” (for email addresses), “ftp://“, “gopher://“, etc. Mandatory use, although browsers do not always show it anymore.
- Login information: As explained above, we can indicate only the username or also include the password. Optional use and less and less frequent.
- Domain: It is the axis around which a URL revolves, it tells the browser which machine on the Internet to connect to (with the help of DNS). Must usage. In some cases it may appear divided by a period, indicating a ‘subdomain’ (1st course.colegio.es).
- Port: Each protocol has a port or ports bound by default, but in some cases we may want to tell the browser to connect, for example, to a secondary web server installed on the same machine as another: in these cases, it is usual to resort to specifying a different port. Optional use.
- Route: When we do not want to access the main page, but a subdirectory of a website. Optional use, but very frequent.
- Parameter: Also known as ‘query’, we describe its many uses in detail here.
- Fragment: To tell the browser not to display the desired page from the beginning, but to locate it at a specific point on that page.
“But there are no domains with file extensions!” (If they do it)
Going back to the example we gave at the beginning, you can already deduce that, unlike the first mentioned URL, the second one does not lead us to a ZIP file on the BBC server, but —remember what we said about the use of ‘@ ‘— to the domain ‘informeonu23.zip’.
If you are thinking that ‘.zip’ are not valid domain extensions, you are out of date. And it is that this same month Google has launched 8 new domain extensions:
As you can see, the extensions .zip (from compressed files) and .mov (from movies) they are there. And before that there were .sh (country domain for Saint Helena, matching the Bash scripts), .pl (country domain for Poland, matching the Perl files) and .rs (country domain for Serbia, matching the Rust files).
Imagine that a cybercriminal had contracted the .zip domain in the example and that users, when accessing it from a malicious link like that, actually downloaded a ZIP file that obviously has nothing to do with the BBC: could be a source of malware.
And now imagine if it happened same… but using an official GitHub repository before the ‘@’.
“Hey, “https://www.genbeta.com/” can’t be used in usernames!” (It’s true, you can’t, but…)
If the “https://www.genbeta.com/” that we use to form URLs could not be included in usernames (just as they cannot be included in file names for the same reasons), the example of a malicious URL that we put in would not be valid, since an error would be thrown. Does it mean you’ve read all this for nothing?
No. Because if well we cannot use /, yes we can use (and, in fact, it is what we have used) the symbol ∕ . Virtually indistinguishable, right? They are the traps of the ASCII symbols, what can we do?
That is why we must be very careful with this potential scam, which has not yet been detected being used by cyber-scammers, but which we will certainly begin to see shortlynow that anyone can register the new .ZIP domains.
Via | Bobbyr on Medium
Image | Daniel on Pixabay + Miguel Á. Padriñán on Pixabay
In Genbeta | Web hosting: what are they and how many types exist