For a long time, I’ve been trying to run far away from Microsoft Excel.
It seems like any desk job you get, knowing the software is one of the untold pre-requisites, and I can understand why. People really swear by the program as a primary means to get things done on the computer. It’s an instant way to display and manipulate data, and extremely easy to learn. What’s not to like?
Microsoft even bundles their VBA language for manipulating rows of data; you can loop it and it seems to work OK. Even though I am pretty good at Excel, I see the app as more of an excuse than anything else. The reasons that stick out in my mind are that documents are not standardized, VBA can be grueling, sharing becomes a huge issue, and scaling is really out of the picture when you’re limited to ~50k rows.
For most people’s it’s just fine and will stay that way. However, if you need to store a lot of data, it’s not an option.
I should preface the rest of this piece by saying that I am not your typical programmer. In fact, I don’t really consider myself to be one, so take my advice here with a grain of salt. If you were a computer science major in college and got one look at my code, you’d probably cringe. I code mainly to get things done and solve problems. As I learn more, I implement more.
As such, you’ll find me writing pseudo code only half of the time, skipping the flow charts (they’re in my head), and making many pitstops over at Stack Overflow when I’m writing a script.
That’s another thing — even though I am very comfortable with languages like Python and C, I haven’t begun to learn true object-oriented programming yet like Objective C or Java. The idea of making websites to interact with databases is much more appealing to me, and I am in the beginning phases of doing this right now.
But still, moving to databases and code are going to be a free but time-consuming option, yet will leave you a much more valuable and powerful asset. You can get to a point where you only crack out the Excel to convert documents to CSV if someone happens to give you an XLSX document.
One of my strongly-held beliefs is that using MySQL in conjunction with PHP and/or Python makes for a much more powerful piece of equipment than Excel can offer. You can’t learn these three things overnight, but when the system become more familiar to you, you will have the tools to accomplish any sort of data-driven project you can dream of.
Designing databases is a job in and of itself. The way I learned how to do this was first through school. I took a class with a professor that everyone seemed to dislike; he was demanding, sure, but he taught me how to create workable relational databases, and for that, it was well worth it.
If you understand the basics, how to set up database tables, how to make relationships/keys/indexes, you will go a long way. There are general books out there that will cover most of these topics. Then, you have to learn how to talk to the database, by using a “query”. I suggest spending some time at SQLZOO if you’re a beginner here.
In my free time, I’ve come up with many analogies to the importance of building a database. My favorite is comparing it to being an architect. You can build something like a hotel according to your client’s wishes, while still following the legal codes. Your limitations are that you have to make something that is accessible, fits guests needs, and is within the designated plot you were given. Then, you have to have some capacity to get people in and out of the building, to make it functional rather than just some neat structure. That’s where the code comes in.
It’s a fact: everyone loves the Python programming language. Why? Well, manipulating strings (variables that hold text) is a breeze. Lots of people have written modules for it to do anything and everything. Taking data from the web is very very easy with modules like Beautiful Soup. Syntax will not make you pull your hair out (though tabbing might). If you are going to learn one language, this should really be it. The best website to visit to do this would be Codeacademy. With prior programming experience, it took me about a week to get most of the basics down.
I get that people love Linux, Windows, and Chrome OS. I’m not going to argue for or against these platforms, but I will say that the Mac has worked the best for me over the years, and that’s what I have going for me right now. Assuming you’ve familiarized yourself with the basics of databases and Python, we’ll go into how to get it all working together on OS X, with a few examples.
To install MySQL, you must go here and select the appropriate community server DMG file. Make sure to run all the packages it gives you. To turn on MySQL, all you need to do is go to the preference pane and do that. Be careful here, because every user account on the machine now technically has access to the database.
To connect to the database, I suggest using Sequel Pro. I have heard Navicat is good, but Sequel Pro is free. If you want to do more advanced things to the database for free, you can also try out Oracle’s MySQL Workbench, which has come a long way over the years.
Python 2.x is already installed on Mac OS X. That does not mean you have to go and install the latest 3.x version. In fact, this would be a mistake, because the versions work so differently from one another. One day it will make sense to do this, but not now.
To check what version of Python you’re running, open Terminal and enter this command:
When I did this it told me that I’m using “Python 2.7.5”.
I’d also suggest using a program like Sublime Text 2 to write your code. The particular program allows you to execute some code right from it, but you have to be careful doing this. Sublime Text 2 sometimes uses a different Python version than what the system uses (or at least I have once ran into this issue), so if you’ve installed 3rd party modules that other people have written, the program will pretend like they don’t even exist. Further, you may get an “EOF” error even if it looks like you’ve written your code correctly. This is because Sublime Text 2 can’t process the “raw_input()” function, where you ask a user for some input.
The way you should execute your code is through Terminal. If you’re unfamiliar with using it, you really only need to know a few commands for now.
Open up Terminal and enter the above command. It will show you all of the files and directories in your current location. The default location you are typically placed in is your home directory, so you’ll see Documents, Downloads and so forth. You can switch to something like the Download folder by putting in the following command:
Not too bad, right? In fact, a faster way of doing this would be to type the “cd” part and begin to type “Downloads”. On the third letter or so, pressing tab should auto-complete the name. If there are two competing folder names with the typed letters, you may have to type more and then press tab again.
To go back a folder, simply type “cd” by itself. To go to the top-level directory of your computer, you can type “cd /”. If your folder has a space in it, you must “escape” it by preceding it with a backslash. For example, let’s say that I want to switch to a folder named “My Stuff”. The way to do this would be to type:
cd My\ Stuff/
Once you’ve navigated to a folder containing your python code, all you have to do is type “python [code].py” with “[code]” being whatever you named your file to execute it. Python code files must have the “py” extension for them to run.
Assuming you’ve learned the basics for both of these steps, you’ll be ready to tackle Part II (forthcoming), where we’ll get the two applications to talk to each other with some example pieces of code!