Code WTF: Check Ethernet Address
by Mike
So I guess I am daring, but not many software developers/programmers would publicly disparage their code like I am about to do. I was looking at some code I wrote about 3.5 years ago when I was first learning python during a summer job in the computer science department at my University and this is what I found.
Basically the intention of the code shutdown computers remotely(x number of hosts in a computer lab) in a wakeable state and had a function to do a wake on lan(for x number of wakeable hosts). This is used in a web based lab administration system and it works pretty well.
Here is the code in question(just the Wake on Lan part):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import socket import struct import string def Wol(eth_addr, ip): """wake on lan given an ethernet/mac address and IP""" byte_addr = [] # check that our eth address is the correct format addr = CheckEthFormat(eth_addr) if addr == False: print "Ethernet/MAC Address in wrong format" return False #take our given ethernet address and put it into 6 sub strings for x in range(len(addr)): if ((x*2) + 1) > len(addr): break byte_addr.append(addr[x*2] + addr[(x*2)+1]) #construct a string with a 6 byte hardware address eth_addrB = struct.pack('BBBBBB',int(byte_addr[0]), int(byte_addr[1]), int(byte_addr[2]), int(byte_addr[3]), int(byte_addr[4]), int(byte_addr[5])) #construct our message for the Wake on Lan packet # 12 leading F's + target machines mac address 16 times packet = "F" * 12 + eth_addrB * 16 #create a datagram socket to send our packet sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1) # send our message sock.sendto(packet, ip) #close the socket sock.close() |
So whats the wtf here?
Honestly, I do see some other potential WTF’s(but lets be generous because I was very new to python at the time). I think the biggest WTF is what you don’t really see there.
do you see line 10?
10 | addr = CheckEthFormat(eth_addr) |
That is the problem. Doesn’t seem like much, until you see the CheckEthFormat() function:
1 2 3 4 5 6 7 8 9 10 11 12 13 | def CheckEthFormat(eth_addr): """Check the format of an ethernet address""" mac = '' for x in eth_addr: if x in string.hexdigits: mac += x #check if Ethernet/Mac Address is correct length if mac != 12: return False #correct length return the correctly formatted address else: return mac.trim() |
So whats wrong here? Lots, because of the way this is done, it left me with doing all kinds of weird stuff. First of all, this is the only place where I actually will check the formatting of an eth address, so it doesn’t actually needs its own function. Secondly its slow(especially if you are doing this for lots of computers because of the for loop and unnecessary evaluations.
Also because of the weird way I did this, I did something really funky things to split the mac address(this is a wtf in itself):
15 16 17 18 | #take our given ethernet address and put it into 6 sub strings for x in range(len(addr)): if ((x*2) + 1) > len(addr): break byte_addr.append(addr[x*2] + addr[(x*2)+1]) |
*shakes head*, this is a very sad state. I am sure there are lots of ways to improve this simple code but here are some of the easiest. First we are going to remove the CheckEthFormat Method. Second we will remove the requirement that strings need to be given to us without “:” or “-” (so all the eth addresses we get are in the format 00-00-00-00-00-00 or 00:00:00:00:00:00).
Then we rewrite our code like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import socket import struct import string import re eth_pattern = '([a-fA-F0-9]{2}[:|\-]?){6}' def Wol(eth_addr, ip): """wake on lan given an ethernet/mac address and IP""" byte_addr = [] # check that our eth address is the correct format if re.match(eth_pattern, eth_addr) == None: print "Ethernet/MAC Address in wrong format" return False #take our given ethernet address and put it into 6 sub strings #split on '-' or ':' byte_addr = re.split('[:|\-]', eth_addr) if len(byte_addr) != 6: return False #construct a string with a 6 byte hardware address eth_addrB = struct.pack('BBBBBB',int(byte_addr[0]), int(byte_addr[1]), int(byte_addr[2]), int(byte_addr[3]), int(byte_addr[4]), int(byte_addr[5])) #construct our message for the Wake on Lan packet # 12 leading F's + target machines mac address 16 times packet = "F" * 12 + eth_addrB * 16 #create a datagram socket to send our packet sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1) # send our message sock.sendto(packet, ip) #close the socket sock.close() |
Added a little regex magic in there, and the code is fixed, shorter, faster and more readable. To eek out a bit more performance out of that regex you can first do a(updated: thanks Gedge):
pattern = re.compile(eth_pattern) pattern.match(eth_addr) #I believe this is the syntax
I am going to be looking at some of the code I have hanging around see If I can find any more WTF’s to share with you all(I am sure I can find plenty).
Any other comments about my old code? Or the new one? Any optimizations(I am sure there are)? Leave a comment!
Comments
New Blog Post: Code WTF: check ethernet address http://bit.ly/ugOoJ
This comment was originally posted on Twitter
You won’t really gain much performance (if any) out of `re.compile(eth_pattern).match(eth_addr)` if you’re doing it every time. basically, re.match is doing that same line, so the performance gain is merely due to one less function call. Now if you compiled eth_pattern on the same line, say:
eth_pattern = re.compile(‘([a-fA-F0-9]{2}[:|\-]?){6}’)
then you would get a performance gain. Definitely moving over to using a regex is gonna get ya a lot more performance than what you had previously though! The code can also be broken in a couple of ways, unless I’m missing something:
1. Since you have the separators as an optional thing in the regex, if they were left out and you go to split, byte_addr will be of length less than 6. Hence, line 20 will break.
2. int(byte_addr[xx]) will break if there are characters, so be sure to use base 16 with int(byte_addr[xx], 16)
Optimizations I can’t really think of. Maybe you could make use of re.findall and do something like:
#——————————————————-
patt = ‘[a-fA-F0-9]{2}’
data = re.findall(eth, patt)
if len(data) != 6: return False
eth_addrB = struct.pack(‘BBBBBB’, *map(int, data, [16, 16, 16, 16, 16, 16]))
#——————————————————-
It’s a little less clear what’s going on there, but it *might* be more efficient (I’m too lazy to check
). Note that re.findall finds non-overlapping sequences, so this should be fine.
I’m trying to convince people to have WoL capabilities for our underwater system. Currently, if it shuts down for some reason, someone has to go out there, dive down, and turn it back on. But if they do have WoL then now I don’t have to look up the format of the magic packet because t’is right here in this post!
P.S. What kind of formatting stuff can I use in my comments?
Sorry, a mistake was made in that code snippet. It should read `re.findall(eth_pattern, eth_addr)`. Copy and paste error from when I double checked to make sure it worked. You can also compile the pattern as an additional optimization.
Jason, you certainly are always right! I will fix up that code when I get home later tonight. Thanks for the input!
I honestly don’t know, which formatting stuff you can use, I think for the most part it supports html for images and such. if you are putting code in there you should be able to use pre tags as well.
[...] was actually the subject of my Code WTF post , so I thought I would jazz it up and put it up on GitHUB as well as commit the new version back [...]