thoughts from a restless mind. |
20. SoCal born and raised. I play volleyball; generally twice a week. Berkeley undergrad: computer science & math My tag frequencies say kind of a lot about me. I like thinking, stories, listening, you~ |
i am the path along unseen heatherSnowball (also called a Chaterism): A poem in which each line is a single word, and each successive word is one letter longer. One of the constrained writing techniques utilised by the Oulipo (Workshop of Potential Literature).
o we all have heard people believe anythingGiven the mathematical genesis of the Oulipo and the interest in the movement among other programmers, I thought that someone must have created a program to generate these, and I was surprised that I couldn’t find one even after some pretty thorough Googling. So I wrote one myself. The C++ code is here.
It takes input from a text file which contains novels from Project Gutenberg, scans for word pairs where the second word is longer by one letter, and builds up a poem using Markov chains.
i am the dawn light before anybody expected something disorderlyThe poems in this post were all created by the program. They have not been edited.
i am the very great change
(via stealyrcarbon)
jbaker1225 of Imgur gutted a broken NES and put a little RPi in it and made an emulation box out of it. Follow the link above for all of the photos and details about how he did it.
CAN WE MAKE THIS A SUMMER PROJECT TOO :D
(via kalda341)
Brings back memories of IBM’s Master the Mainframe competitions :D
There’s something so ‘war games’, to me, about finding all these mainframes available on the internet. What follows is my gallery of the Logon screens I’ve collected. I’m not going to identify them (other than whats in the screenshot). Enjoy! (I’ll try to keep this updated as I add more to it)
If anyone reading this has some to share feel free to send them to me.
(via c1qfxugcgy0)
Read moreCharles Babbage (1864)
(Source: aofa.cs.princeton.edu, via isomorphismes)
Earlier this week, Blake asked me for some help with a problem he’s working on. He has a couple of hash functions that are being used to route web traffic to a number of different servers. A hash function takes an input, such as a blog’s url, and outputs a number between 0 and 232. Say we have 1000 servers, that means that each one will handle about 430 million points in the hash-space.
The data looked something like this (with fake blog names, of course):
## blog.name H1 H2 H3 H4 ## 1 23.tumblr.com 3.137e+09 1.866e+09 6.972e+08 5.792e+08 ## 2 19.tumblr.com 1.875e+09 2.545e+08 2.606e+09 1.312e+09 ## 3 34.tumblr.com 1.366e+09 2.236e+09 1.106e+09 3.640e+09 ## 4 43.tumblr.com 2.639e+09 1.098e+09 8.755e+08 1.507e+09 ## 5 90.tumblr.com 6.564e+08 5.397e+07 3.084e+09 2.961e+09 ## 6 29.tumblr.com 2.476e+09 4.532e+08 2.787e+08 4.894e+08One important thing to point out before we get started is that this has to be a representative sample of the request data. Despite the wild popularity of my personal blog, it doesn’t get a sliver of the traffic that Beyonce gets. That fact needs to be represented in the sample data, meaning that her blog should appear in more rows of the sample data than mine.
Plot the data
The first thing I ever do is plot data to get a sense of what I’m working with and what I’m trying to accomplish. The density plots below show the distribution of values in the hash space for each algorithm. If you’re not familiar with kernel density plots, you can imagine this to be a smoothed (and prettier) version of a histogram. For the electrical engineers out there, it’s the sum of the convolution of a kernel function (usually a gaussian), with an impulse function at each of the points on the x-axis (represented here by dots).
By comparison, here are the density plots of a “near-ideal” example (1000 pulls from a uniform distribution) and a bad example (all assigned to the value 2e+09). The worst case example here shows the shape of the kernel function.
Calculate the entropy
In information theory, entropy is the minimum number of bits required (on average) to identify an encoded symbol (stay with me here…). If we’re trying to transmit a bunch of text digitally, we would encode the alphabet where each “symbol” is a letter that will be represented in 1’s and 0’s. In order to transmit our message quickly, we want to use as few bits as possible. Since the letter “e” appears more frequently than the letter “q”, we want the symbol for “e” to have fewer bits than “q”. Make sense?
Huffman Coding is one encoding algorithm. There’s an example implementation here, which assigns the code
100to “e” and1111110010to “q”. The more “uneven” your symbol distribution is, the fewer bits it will take, on average, to transmit your message (meaning you’ll have a lower entropy). The entropy value is a lower bound for the actual weighted average of the symbol lengths. There are special cases where some encoding algorithms get closer to the entropy value than others, but none will ever surpass it.The actual entropy formula is:
\[ H(x)=-\sum_{i=0}^{N-1} p_i log_2(p_i) \]
Where \( H(x) \) is the entropy, and \( p_i \) is the probability of that symbol \( i \) will appear. In the example I linked to above, \( p_e = 0.124 \) and \( p_q=0.0009 \), so it makes sense that e’s symbol is so much shorter. In the example, the average number of bits per symbol is \( \sum S_i p_i = 4.173 \frac{bits}{symbol} \), where \( S_i \) is the number of bits in the symbol. The entropy, from the above equation, is \( 4.142 \frac{bits}{symbol} \).
The example problem of web traffic distribution is a little different. We’re not actually encoding anything, but rather trying to make the theoretical lower bound for average number of bits/signal as high as possible.
We can consider each server to be a symbol, and the amount of traffic that it recieves is decided by the hash function that we’re trying to choose. If one of our servers is the equivalent of the letter “e”, it’s going to be totally overloaded while the “q” isn’t going to be handling much traffic at all. We want each symbol (server) to appear (receive traffic) equally often.
So to calculate the entropy, we’ll take a histogram of the hash values with 20 buckets (representing the 20 servers). That will give us the number of requests that go to each server. Dividng that by the total number of requests gives us each server’s probability of handing the next incoming request. These are the \( p_i \) values that we need in order to calculate the entropy. In code, it looks like this:
calc.entropy <- function(hash.value) { h = hist(hash.value, plot = FALSE, breaks = seq(0, 2^32, length.out = 21)) probs = h$counts/sum(h$counts) print(probs) entropy = -sum(probs * log2(probs)) }The entropy values for our four hash functions are:
## hash.function entropy ## 1 H1 4.203 ## 2 H2 4.226 ## 3 H3 4.254 ## 4 H4 4.180And while we’re at it, here’s the entropy of our best/worst case example that we plotted earlier.
## hash.function entropy ## 1 near.ideal 4.309 ## 2 worst.possible 0.000Why is the worst case value 0? Because if all traffic is going to one server, we wouldn’t need any bits at all to tell us which server the request is going to. The theoretical limit for a histogram with 20 buckets is: \( -20\frac{1}{20}log_2{\frac{1}{20}} = 4.32 \), which we’re close to but can never exceed.
All of our hash functions appear to be working pretty well, especially for such a small sample size that I’m using for this blog post. It looks like our winner is H3.
To summarize here’s what we did to find the optimal hashing function:
- Get a representative sample of your web traffic
- Run each request through the hashing function
- Take a histogram of the resulting values with N bins, where N is the number of servers you have available
- Divide the bin counts by the total number of requests in your sample to get the probability of handing a request for each server
- Calculate the entropy, \( H(x)=-\sum_{i=0}^{N-1} p_i log_2(p_i) \), for each hash function
There are more considerations to take, like setting an upper bound on the value of \( p_i \) to ensure that no single server ever gets so busy that it can’t handle its load.
If you want to read up more on information theory, Elements of Information Theory by Thomas Cover and Joy Thomas is an excellent book that is reasonably priced (used) on Amazon.
There is also, of course, Claude Shannon’s landmark paper from 1948 “A Mathematical Theory of Communication”, in which he essentially defines the entire field.
(via engineering)
Ubiquity: A simple starting question: What is “complexity”?
Melanie Mitchell: I would call this a “deceptively simple” question—in fact, it is one of the most difficult questions of all! The field of complexity arose out of the strong feeling of some scientists that there are deep similarities among certain highly “complex” systems in nature, society, and technology. Examples of such systems include the brain, the immune system, cells, insect societies, economies, the World Wide Web, and so on. By “similarities,” I don’t mean that there are necessarily a single set of principles that governs these disparate systems, but rather that all these systems exhibit behavior that has been described as “adaptive,” “life-like,” “intelligent,” and “emergent.” None of these terms have precise meanings, yet, which makes a formal definition of “complex system” impossible at this time. One informal (and somewhat circular) definition of complex system is the following: A system with large numbers of interacting components, in which the components are relatively simple compared with the system as a whole, in which there is no central control or global communication among the components, and in which the interactions among the components gives rise to complex behavior. Here, “complex behavior” refers to the informal terms (e.g., adaptive, emergent) that I listed above.
Ubiquity: Is there a science of complexity?
MM: I think of the field of complexity as a loose grouping of different disciplines that study such complex systems and seek to elucidate common principles among these systems. More than 100 years ago, the philosopher and psychologist William James said psychology is not yet a science but rather “the hope of a science.” I think the same thing can be said today of complexity. I personally try to avoid the term “complexity science” and instead use the term “the sciences of complexity.”
—
(via statest)
Some cute applications of computability theory:
- We now know that, if an alien with enormous computational powers came to Earth, it could prove to us whether White or Black has the winning strategy in chess. To be convinced of the proof, we would not have to trust the alien or its exotic technology, and we would not have to spend billions of years analyzing one move sequence after another. We’d simply have to engage in a short conversation with the alien about the sums of certain polynomials over finite fields.
- There’s a finite (and not unimaginably-large) set of boxes, such that if we knew how to pack those boxes into the trunk of your car, then we’d also know a proof of the Riemann Hypothesis. Indeed, every formal proof of the Riemann Hypothesis with at most (say) a million symbols corresponds to some way of packing the boxes into your trunk, and vice versa. Furthermore, a list of the boxes and their dimensions can be feasibly written down.
- Supposing you do prove the Riemann Hypothesis, it’s possible to convince someone of that fact, without revealing anything other than the fact that you proved it. It’s also possible to write the proof down in such a way that someone else could verify it, with very high confidence, having only seen 10 or 20 bits of the proof.
- If every second or so your computer’s memory were wiped completely clean, except for the input data; the clock; a static, unchanging program; and a counter that could only be set to 1, 2, 3, 4, or 5, it would still be possible (given enough time) to carry out an arbitrarily long computation — just as if the memory weren’t being wiped clean each second. This is almost certainly not true if the counter could only be set to 1, 2, 3, or 4. The reason 5 is special here is pretty much the same reason it’s special in Galois’ proof of the unsolvability of the quintic equation.
(Source: studeo, via isomorphismes)
Don’t forget the time you spend finding the chart to look up what you save. And the time spent reading this reminder about the time spent. And the time trying to figure out if either of those actually make sense. Remember, every second counts toward your life total, including these right now.
This is important for those of us who love writing automation scripts.
(via proofmathisbeautiful)
//Pixel Sorting - Ha Long Bay
int threshold;
String url;
PImage img;
void setup(){
url = “http://farm2.staticflickr.com/1274/4681305225_fb217325c8_b.jpg”;
img = loadImage(url);
size(img.width,img.height);
frameRate(12);
threshold = 135;
}
void draw(){
img.loadPixels();
for (int k=width; k<width*height-width; k++){
if (threshold < 153) {
if (brightness(img.pixels[k]) > threshold){
img.pixels[k-width] =
color(int(red(img.pixels[k-1])),
int(green(img.pixels[k-1])),
int(blue(img.pixels[k-1])));
} else {
img.pixels[k] =
color(int(red(img.pixels[k])),
int(green(img.pixels[k])),
int(blue(img.pixels[k])));
}
threshold = threshold+1;
} else {
threshold = 135;
}
}
img.updatePixels();
image(img,0,0);
}
THIS HAS BEEN A LIFE CHANGING POST BROUGHT TO YOU IN PART BY MY HANGOVER.
April 1, 1976: Apple is Founded
On this day in 1976, Steve Jobs, Steve Wozniak, and Ronald Wayne founded Apple, a business dedicated to selling personal computers.The three founders worked in Jobs’ parents’ garage and developed their first product, Apple I. The following July, Apple introduced 200 personal computers to the market and sold them for $666.66.
Throughout the years, Apple has gained its title as a global leader in the consumer electronics industry due to its iconic products such as the iPhone, iPod, iMac, and iPad.
In 2011, the world had to say goodbye to Apple’s leading force - Steve Jobs. Take a moment to remember the life of Steve Jobs with PBS NewsHour’s special report.
Image (from top to bottom): Steve Wozniak and Steve Jobs (Tony Avelar/Bloomberg via Getty Images), An early Apple Macintosh computer c. 1981 (Bertrand LAFORET/Gamma-Rapho via Getty Images)
(via neeraj-jain)
i learned a LOT of new technical skills and got practice with my organizational skills. we scored first place, by a landslide (according to the judges), in the inject category, which i was completely in charge of
and apparently i was THE BEST in terms of helpdesk customer service.
another one of our team members was told that he gave the best presentation to the board of executives, which is REALLY AWESOME for him especially since he’s a first year!! at the very least it’s a great self esteem boost.
our team got to spend a lot of time together and i definitely know all of them so much better than before,and i like all of them! plans are to grab dinner together some time this week!
some of us had a really good discussion about privilege and classism, which was actually started by a good friend who has been somewhat ignorant about his privilege there in the past
had a good discussion with the bf about privilege as well, and he made some really good points that i hadn’t even thought of before
when we were really, really frustrated with the lack of transparency in scoring and other stuff, we came together as a team, expressed differing opinions, and compromised to come to a decision together, which ended up with us addressing our concerns with the organizers.
and when the black/white/red teams were being really unprofessional and rude, we didn’t laugh at their jokes, and refused to engage them / stoop to their level when they were continuously harassing us
so ya, not all bad, but it’s kind of lol-y to me that some of these good things were only possible because of how badly the competition was set up…
Much congrats to Berke1337 on a 2nd place finish at the Collegiate Cyber Defense Competition! :D
How Computer-Generated Animations Were Made, Circa 1964
Interesting computer-made presentation demonstrating the earlier concepts of computer graphics. It is 15 minutes long, silent, and very slow moving, but from a digital literacy perspective, essential watching:
This film explains how the computer scientists and mathematicians at Bell Labs created early computer graphics films, like most (though not all) of these films, made by Bell Labs employees E.E. Zajac, A. Michael Noll, Ken Knowlton, Frank Sinden, and many others.
This film, A Computer Technique For the Production of Animated Movies, from 1964, gives the basics on the process, from Ken Knowlton’s BEFLIX programming language for a raster-scan (bitmap) output, to the hardware details (IBM 7094 mainframe, Stromberg-Carlson 4020 microfilm printer).
(via jtotheizzoe)
digg:
This is a good way to recycle old iMacs.
Where’s the “Flower Power” one? Regardless, want.
Except for the looming fear of being bludgeoned from above, not too bad.
The iMac G3 weighed 35-40 pounds, but just the plastic casing probably doesn’t weigh excessively much. Also, colors!
