In undergrad I wrote a tutorial on #Julia programming language in which I analyzed the output of the Collatz function. The Collatz Conjecture was really fascinating to me due to its seemingly simple wording and almost impossible to solve mystery. Since then, #Julia changed significantly, and Terence Tao made a contribution that gets us closer to the proof of the Collatz Conjecture.

View on Julia Notebook nbviewer: https://lnkd.in/eGnHgy2 Update: In September 2019 Tao published a paper in which he proved that the Collatz Conjecture is “almost true” for “almost all numbers.”

Tao’s Paper: https://lnkd.in/eCus8YU

Tao’s Blog Post: https://lnkd.in/eb7ePAu

• Juno from http://junolab.org/

• IJulia kernel for Jyputer notebook from https://github.com/JuliaLang/IJulia.jl

Juno is a good IDE for writing and evaluating julia code quickly. IJulia notebook is good for writing tutorials and reports with julia code results embeded in the document.

Once you’ve installed everything I recommend opening up the Juno IDE and going through the tutorial.

## Quick start

I execute all julia code below in Juno. I suggest you create a folder on your desktop and make it your working directory where we will be able to write files. First, a couple of basic commands. To evaluate code in Juno you just need to press Ctrl-D (its in the Juno tutrial):

VERSION # print julia version number

pwd() # print working directory
homedir() # print the default home directory

# cd(pwd()) # set working directory to DirectoryPath

"/Users/kobakhitalishvili"

3+5 # => 8
5*7 # => 35
3^17 # => 129140163
3^(1+3im) # im stands for imaginary number => -2.964383781426573 - 0.46089998526262876im
log(7) # natural log of 7 => 1.9459101490553132

1.9459101490553132

1.9459101490553132


Interesting that julia has imaginary number built in. Now, variables and functions:

a = cos(pi) + im * sin(pi) # assigning to a variable

-1.0 + 1.2246467991473532e-16im

b = ℯ^(im*pi)

-1.0 + 1.2246467991473532e-16im

a == b # boolean expression. It is an euler identity.

true


Lets see how to define functions. Here is a chapter on functions in julia docs for more info.

plus2(x) = x + 2 # a compact way

function plustwo(x) # traditional function definition
return x+2
end

plustwo (generic function with 1 method)


plus2(11)

13

plustwo(11)

13


Here is a julia cheatsheet with above and additional information in a concise form. Next, lets write a function that will generate some data which we will write to a csv file, plot, and save the plot.

## Data frames, plotting, and file Input/Output

I decided to write a function $f(x)$ that performs the process from the Collatz conjecture. Basically, for any positive integer $x$ if $x$ is even divide by $2$, if x is odd multiply by $3$ and add $1$. Repeat the process until you reach one. The Collatz conjecture proposes that regardless of what number you start with you will always reach one. Here it is in explicit form

$f(x) = \begin{cases} x/2, & \mbox{if } x\mbox{ is even} \\ 3x+1, & \mbox{if } x\mbox{ is odd} \end{cases}$

The function collatz(x) will count the number of iterations it took for the starting number to reach $1$.

function collatz(x)
# Given a number x
# - divide by 2 if x is even
# - multiply by 3 and add 1 if x is odd
# until x reaches 1
# Args:
# - param x: integer
# - return: integer
count = 0
while x != 1
if x % 2 == 0
x = x/2
count += 1
else
x = 3*x + 1
count += 1
end
end
return count
end

collatz(2)

1

collatz(3)

3


### Data frames

Now, let’s create a dataframe with the number of steps needed to complete the Collatz process for each number from 1 to 1000. We will use the DataFrames package because the base julia library does not have dataframes.

# install DataFrames package
using Pkg

using DataFrames

# Before populating the dataframe with collatz data lets see how to create one
df = DataFrame(Col1 = 1:10, Col2 = ["a","b","c","d","e","f","a","b","c","d"])
df


10 rows × 2 columns

Col1Col2
Int64String
11a
22b
33c
44d
55e
66f
77a
88b
99c
1010d
# Neat. Now let's generate data using collatz function
df = DataFrame(Number = 1:1000, NumofSteps = map(collatz,1:1000))
first(df, 10)


10 rows × 2 columns

NumberNumofSteps
Int64Int64
110
221
337
442
555
668
7716
883
9919
10106

map() applies collatz() function to every number in the 1:1000 array which is an array of numbers [1,2,3,...,1000]. In this instance map() returns an array of numbers that is the output of collatz() function.

# To get descriptive statistics
describe(df)


2 rows × 8 columns

variablemeanminmedianmaxnuniquenmissingeltype
SymbolFloat64Int64Float64Int64NothingNothingDataType
1Number500.51500.51000Int64
2NumofSteps59.542043.0178Int64

Before we save it lets categorize the points based on whether the original number is even or odd.

df.evenodd = map(x -> if x % 2 == 0 "even" else "odd" end, 1:1000) # create new evenodd column
# rename!(df, :x1, :evenodd) #rename it to evenodd
first(df,5)


5 rows × 3 columns

NumberNumofStepsevenodd
Int64Int64String
110odd
221even
337odd
442even
555odd

I use the map() function with an anonymous function x -> if x % 2 == 0 "even" else "odd" end which checks for divisibility by 2 to create a column with “even” and “odd” as entries. Finally, I rename the new column “evenodd”.

Additionally, let’s identify the prime numbers as well.

# Pkg.add("Primes")
using Primes

isprime(3)

true

df.isprime = map(x -> if isprime(x) "yes" else "no" end,df.Number)
first(df,5)


5 rows × 4 columns

NumberNumofStepsevenoddisprime
Int64Int64StringString
110oddno
221evenyes
337oddyes
442evenno
555oddyes
# Pkg.add("CSV")
using CSV

# To save the data frame in the working directory
CSV.write("collatz.csv", df)

"collatz.csv"


### Plotting data

To plot the data we will use the Gadfly package. Gadly resembles ggplot in its functionality. There is also the Plots package which brings mutliple plotting libraries into a single API.

To save plots in different image formats we will need the Cairo pakage.

# Pkg.add(["Cairo","Fontconfig","Plots", "Gadfly","PlotlyJS","ORCA"])

# using Plots
# plotlyjs() use plotly backend
# pyplot() use pyplot backend
using Cairo

Gadfly.plot(df,x=:Number, y=:NumofSteps, color = :evenodd, Geom.point) # assign plot to variable