Getting started with Julia programming language
In undergrad I wrote a tutorial on #Julia programming language in which I analyzed the output of the Collatz function. The Collatz Conjecture was really fascinating to me due to its seemingly simple wording and almost impossible to solve mystery. Since then, #Julia changed significantly, and Terence Tao made a contribution that gets us closer to the proof of the Collatz Conjecture.
View on Julia Notebook nbviewer: https://lnkd.in/eGnHgy2
Update: In September 2019 Tao published a paper in which he proved that the Collatz Conjecture is “almost true” for “almost all numbers.”
Tao’s Paper: https://lnkd.in/eCus8YU
Tao’s Blog Post: https://lnkd.in/eb7ePAu
Table of contents
Download
Download Julia from http://julialang.org/
Download Julai IDEs:
-
Juno from http://junolab.org/
-
IJulia kernel for Jyputer notebook from https://github.com/JuliaLang/IJulia.jl
Juno is a good IDE for writing and evaluating julia code quickly. IJulia notebook is good for writing tutorials and reports with julia code results embeded in the document.
Once you’ve installed everything I recommend opening up the Juno IDE and going through the tutorial.
Quick start
I execute all julia code below in Juno. I suggest you create a folder on your desktop and make it your working directory where we will be able to write files. First, a couple of basic commands. To evaluate code in Juno you just need to press Ctrl-D
(its in the Juno tutrial):
VERSION # print julia version number
pwd() # print working directory
homedir() # print the default home directory
# cd(pwd()) # set working directory to DirectoryPath
"/Users/kobakhitalishvili"
3+5 # => 8
5*7 # => 35
3^17 # => 129140163
3^(1+3im) # im stands for imaginary number => -2.964383781426573 - 0.46089998526262876im
log(7) # natural log of 7 => 1.9459101490553132
1.9459101490553132
1.9459101490553132
Interesting that julia has imaginary number built in. Now, variables and functions:
a = cos(pi) + im * sin(pi) # assigning to a variable
-1.0 + 1.2246467991473532e-16im
b = â„Ż^(im*pi)
-1.0 + 1.2246467991473532e-16im
a == b # boolean expression. It is an euler identity.
true
Lets see how to define functions. Here is a chapter on functions in julia docs for more info.
plus2(x) = x + 2 # a compact way
function plustwo(x) # traditional function definition
return x+2
end
plustwo (generic function with 1 method)
plus2(11)
13
plustwo(11)
13
Here is a julia cheatsheet with above and additional information in a concise form. Next, lets write a function that will generate some data which we will write to a csv file, plot, and save the plot.
Data frames, plotting, and file Input/Output
I decided to write a function $f(x)$ that performs the process from the Collatz conjecture. Basically, for any positive integer $x$ if $x$ is even divide by $2$, if x is odd multiply by $3$ and add $1$. Repeat the process until you reach one. The Collatz conjecture proposes that regardless of what number you start with you will always reach one. Here it is in explicit form
\[f(x) = \begin{cases} x/2, & \mbox{if } x\mbox{ is even} \\ 3x+1, & \mbox{if } x\mbox{ is odd} \end{cases}\]The function collatz(x)
will count the number of iterations it took for the starting number to reach $1$.
function collatz(x)
# Given a number x
# - divide by 2 if x is even
# - multiply by 3 and add 1 if x is odd
# until x reaches 1
# Args:
# - param x: integer
# - return: integer
count = 0
while x != 1
if x % 2 == 0
x = x/2
count += 1
else
x = 3*x + 1
count += 1
end
end
return count
end
collatz(2)
1
collatz(3)
3
Data frames
Now, let’s create a dataframe with the number of steps needed to complete the Collatz process for each number from 1 to 1000. We will use the DataFrames
package because the base julia library does not have dataframes.
# install DataFrames package
using Pkg
# Pkg.add("DataFrames")
using DataFrames
# Before populating the dataframe with collatz data lets see how to create one
df = DataFrame(Col1 = 1:10, Col2 = ["a","b","c","d","e","f","a","b","c","d"])
df
Col1 | Col2 | |
---|---|---|
Int64 | String | |
1 | 1 | a |
2 | 2 | b |
3 | 3 | c |
4 | 4 | d |
5 | 5 | e |
6 | 6 | f |
7 | 7 | a |
8 | 8 | b |
9 | 9 | c |
10 | 10 | d |
# Neat. Now let's generate data using collatz function
df = DataFrame(Number = 1:1000, NumofSteps = map(collatz,1:1000))
first(df, 10)
Number | NumofSteps | |
---|---|---|
Int64 | Int64 | |
1 | 1 | 0 |
2 | 2 | 1 |
3 | 3 | 7 |
4 | 4 | 2 |
5 | 5 | 5 |
6 | 6 | 8 |
7 | 7 | 16 |
8 | 8 | 3 |
9 | 9 | 19 |
10 | 10 | 6 |
map()
applies collatz()
function to every number in the 1:1000
array which is an array of numbers [1,2,3,...,1000]
. In this instance map()
returns an array of numbers that is the output of collatz()
function.
# To get descriptive statistics
describe(df)
variable | mean | min | median | max | nunique | nmissing | eltype | |
---|---|---|---|---|---|---|---|---|
Symbol | Float64 | Int64 | Float64 | Int64 | Nothing | Nothing | DataType | |
1 | Number | 500.5 | 1 | 500.5 | 1000 | Int64 | ||
2 | NumofSteps | 59.542 | 0 | 43.0 | 178 | Int64 |
Before we save it lets categorize the points based on whether the original number is even or odd.
df.evenodd = map(x -> if x % 2 == 0 "even" else "odd" end, 1:1000) # create new evenodd column
# rename!(df, :x1, :evenodd) #rename it to evenodd
first(df,5)
Number | NumofSteps | evenodd | |
---|---|---|---|
Int64 | Int64 | String | |
1 | 1 | 0 | odd |
2 | 2 | 1 | even |
3 | 3 | 7 | odd |
4 | 4 | 2 | even |
5 | 5 | 5 | odd |
I use the map()
function with an anonymous function x -> if x % 2 == 0 "even" else "odd" end
which checks for divisibility by 2 to create a column with “even” and “odd” as entries. Finally, I rename the new column “evenodd”.
Additionally, let’s identify the prime numbers as well.
# Pkg.add("Primes")
using Primes
isprime(3)
true
df.isprime = map(x -> if isprime(x) "yes" else "no" end,df.Number)
first(df,5)
Number | NumofSteps | evenodd | isprime | |
---|---|---|---|---|
Int64 | Int64 | String | String | |
1 | 1 | 0 | odd | no |
2 | 2 | 1 | even | yes |
3 | 3 | 7 | odd | yes |
4 | 4 | 2 | even | no |
5 | 5 | 5 | odd | yes |
# Pkg.add("CSV")
using CSV
# To save the data frame in the working directory
CSV.write("collatz.csv", df)
"collatz.csv"
Plotting data
To plot the data we will use the Gadfly package. Gadly resembles ggplot
in its functionality. There is also the Plots
package which brings mutliple plotting libraries into a single API.
To save plots in different image formats we will need the Cairo
pakage.
# Pkg.add(["Cairo","Fontconfig","Plots", "Gadfly","PlotlyJS","ORCA"])
# Pkg.add("Gadfly")
# using Plots
# plotlyjs() use plotly backend
# pyplot() use pyplot backend
using Cairo
using Gadfly
Gadfly.plot(df,x=:Number, y=:NumofSteps, Geom.point, Guide.title("Collatz Conjecture"))
<?xml version=”1.0” encoding=”UTF-8”?>
Looks pretty. I will color the points based on whether the original number is even or odd.
Gadfly.plot(df,x=:Number, y=:NumofSteps, color = :evenodd, Geom.point) # assign plot to variable
<?xml version=”1.0” encoding=”UTF-8”?>